math/mpx-mul4-*-sse2.S (squash): We don't care about the top half of c3 here.
authorMark Wooding <mdw@distorted.org.uk>
Sat, 27 Oct 2018 09:43:24 +0000 (10:43 +0100)
committerMark Wooding <mdw@distorted.org.uk>
Sat, 27 Oct 2018 09:47:45 +0000 (10:47 +0100)
The previous version of the comment erroneously claimed that the top
half of c3 held y_1; in fact it holds y_2, but we'll clobber it anyway
because the objective is to carry up into y_1, so mark it as
don't-care (like lo).

math/mpx-mul4-amd64-sse2.S
math/mpx-mul4-x86-sse2.S

index d8f54e1..84f9e3f 100644 (file)
        // Finally extract the answer.  This complicated dance is better than
        // storing to memory and loading, because the piecemeal stores
        // inhibit store forwarding.
-       movdqa  \c3, \t                 // (y_0, y_1)
+       movdqa  \c3, \t                 // (y_0, ?)
        movdqa  \lo, \t                 // (y^*_0, ?, ?, ?)
        psrldq  \t, 8                   // (y_2, 0)
        psrlq   \c3, 32                 // (floor(y_0/B), ?)
index cdc3596..ee741d2 100644 (file)
        // Finally extract the answer.  This complicated dance is better than
        // storing to memory and loading, because the piecemeal stores
        // inhibit store forwarding.
-       movdqa  \c3, \t                 // (y_0, y_1)
+       movdqa  \c3, \t                 // (y_0, ?)
        movdqa  \lo, \t                 // (y^*_0, ?, ?, ?)
        psrldq  \t, 8                   // (y_2, 0)
        psrlq   \c3, 32                 // (floor(y_0/B), ?)