A couple of other changes to ease the way:
* Split `gcm_mulk_...' into two endianness variants, so that
CPU-specific variants don't have to track what's going on through
the key table.
* Abstract out `recover_k' to decode the key value from a table, for
the use of `gcm_concat'. This is, of course, necessary if the table
format is CPU-dependent.
* Add testing to make sure that `mktable'/`recover_k' agree with each
other.
There are currently no fancy implementations, but you can tell what's
coming. No actual functional change, except for logging if you set
`CATACOMB_CPUDISPATCH_DEBUG' in the environment.