(x86 asm): Zero the high parts of the ?MM registers if available.
authorMark Wooding <mdw@distorted.org.uk>
Thu, 23 Aug 2018 04:13:55 +0000 (05:13 +0100)
committerMark Wooding <mdw@distorted.org.uk>
Thu, 23 Aug 2018 06:23:31 +0000 (07:23 +0100)
commitb9b279b4105524d5d4e5dcd389141645d904aa0c
treef9cd36b38a033e1619e3b9f6e97ae041d0d3f37b
parent2921991916ba2362d054111a0d041ff170c899c1
(x86 asm): Zero the high parts of the ?MM registers if available.

There's a performance penalty to trying to preserve the upper parts of
the SSE/AVX vector registers, and it's pointless because we don't need
to preserve them.  (Earlier AVX-capable processors would carefully snip
off the upper parts of the registers and put them in a box, and then
glue them back on when they were wanted, which isn't so bad.  Later
processors instead just track the upper part of the register as an
additional operand, which leads to unnecessary latency.)

Add AVX-specific entry points to the necessary routines, and call them
when AVX is detected.  This would all be easier if Intel had chosen
`vzeroupper' from an existing `nop' encoding space.
13 files changed:
base/dispatch.c
base/dispatch.h
math/mpmont.c
math/mpx-mul4-amd64-sse2.S
math/mpx-mul4-x86-sse2.S
math/mpx.c
symm/chacha-x86ish-sse2.S
symm/chacha.c
symm/rijndael-base.c
symm/rijndael-x86ish-aesni.S
symm/rijndael.c
symm/salsa20-x86ish-sse2.S
symm/salsa20.c