math/mpx-mul4-amd64-sse2.S: Improve the end-of-loop condition testing.
authorMark Wooding <mdw@distorted.org.uk>
Thu, 7 Nov 2019 01:54:57 +0000 (01:54 +0000)
committerMark Wooding <mdw@distorted.org.uk>
Sat, 9 May 2020 19:57:33 +0000 (20:57 +0100)
Previously, I waited until `rdi' was set up for the next iteration
before comparing it against the limit.  But in fact, `DV' already has
the right value, so we can compare earlier.

math/mpx-mul4-amd64-sse2.S

index da3e6d6..1c205a7 100644 (file)
@@ -1270,10 +1270,10 @@ FUNC(mpxmont_redc4_amd64_sse2)
        jb      7b
 
        // All done for this iteration.  Start the next.
-8:     mov     rdi, DV                 // -> Z = dv[i]
-       mov     rbx, NV                 // -> X = nv[0]
-       cmp     rdi, DVLO               // all done yet?
+       cmp     DV, DVLO                // all done yet?
        jae     9f
+       mov     rdi, DV                 // -> Z = dv[i]
+       mov     rbx, NV                 // -> X = nv[0]
        add     DV, 16
        call    mont4
        add     rdi, 16