Support Intel's AES Native Instructions where available on x86 hardware.
* Add a detector for the CPU feature.
* Implement AES in terms of the Intel AESNI instructions.
We can't use the fancy instructions to implement Rijndael with large
blocks, unfortunately; we /can/ (and do) use the rather cumbersome
key-scheduling instructions.
There's a slightly annoying endianness difference between Catacomb
(big-endian) and AESNI (little-endian). Resolve this by (a) maintaining
the key schedule in little-endian order if we're using AESNI (and blocks
are exactly 128 bits); and (b) end-swapping the block on entry and exit
to the block cipher operations.