* The carry loop is wrong if the destination is an exact multiple of
four limbs. Fortunately, it isn't.
* The initial pass feeds into the main loop unconditionally, unlike
`mpxmont_mul4_...' (from which I think the commentary was
uncritically copied), so being at the end of it doesn't tell you
anything about whether to start another. And, indeed, we do indeed
check the loop-end condition.