base/asm-common.h, *.S: Use consistent little-endian notation for SIMD regs.
This makes operations which involve changing one's perspective about the
SIMD processing elements make significantly more sense. In particular,
I hope that this removes a layer of brain-twisting from the GCM code.
* Adjust all of the register-contents diagrams so that less
significant elements are on the right, rather than on the left.
* Change the x86 `SHUF' macro so that the desired pieces are listed in
decreasing significance order, so `SHUF(3, 2, 1, 0)' would be a
no-op.
I would, of course, continue to use big-endian notation on a target
which actually used a big-endian ordering natively, but we don't
currently support any of them.