math/gfreduce.[ch]: Fix out-of-bounds memory access.
The final pass of the reduction adds a multiple of the extra top bits
from the most significant word; but at this point, the generated
instruction sequence will access a word one beyond the bottom of the
supplied memory vector. While it (probably) won't modify this word, it
will still attempt to read and write it.
This is relatively harmless, since typically the vector will have been
allocated from our custom arena, and therefore there'll be a header word
in this position, but hand-built polynomials may cause trouble.
Fix this bug by keeping track of the first instruction which accesses a
word other than the least significant, and using this alternative entry
point in the final pass. Fortunately, there's an unused slot in the
context structure which we can use for this purpose!
(Yes, the previous refactoring was largely for the purpose of fixing
this bug.)