From: Mark Wooding <mdw@distorted.org.uk>
Date: Thu, 26 May 2016 08:26:09 +0000 (+0100)
Subject: symm/poly1305.c: Fix 16/32-bit `carry_reduce'.
X-Git-Tag: 2.4.0~62
X-Git-Url: https://git.distorted.org.uk/~mdw/catacomb/commitdiff_plain/57e7040b318f0ffc5ab43c3fb62df9a2bef42ac7

symm/poly1305.c: Fix 16/32-bit `carry_reduce'.

I managed to botch the bounds last time.
---

diff --git a/symm/poly1305.c b/symm/poly1305.c
index 99cc5797..55fe3711 100644
--- a/symm/poly1305.c
+++ b/symm/poly1305.c
@@ -230,9 +230,10 @@ static void carry_reduce(uint32 u[12])
   uint32 c;
 
   /* Do sequential carry propagation (16-bit CPUs are less likely to benefit
-   * from instruction-level parallelism).  Start at u_10; truncate it to 11
-   * bits, and add the carry onto u_11.  Truncate u_11 to 10 bits, and add
-   * five times the carry onto u_0.  And so on.
+   * from instruction-level parallelism).  Start at u_9; truncate it to 11
+   * bits, and add the carry onto u_10.  Truncate u10 to 11 bits, and add the
+   * carry onto u_11.  Truncate u_11 to 10 bits, and add five times the carry
+   * onto u_0.  And so on.
    *
    * The carry is larger than the pieces we're leaving behind.  Let c_i be
    * the high portion of u_i, to be carried onto u_{i+1}.  I claim that c_i <
@@ -241,13 +242,15 @@ static void carry_reduce(uint32 u[12])
    * carry is bounded above by 636*2^22 + 12785*2^10 < 2557*2^20.  Hence, the
    * carry out is at most 2557*2^10, as claimed.
    *
-   * Once we reach u_10 for the second time, we start with u_10 < 2^11.  The
-   * carry into u_10 is at most 2557*2^10 < 1279*2^11 as calculated above; so
-   * the carry out into u_11 is at most 1280.  Since u_11 < 2^10 prior to
-   * this carry in, all of the u_i are now bounded above by 2^11.  The final
-   * reduction therefore only needs a conditional subtraction.
+   * Once we reach u_9 for the second time, we start with u_9 < 2^11.  The
+   * carry into u_9 is at most 2557*2^10 < 1279*2^11 as calculated above; so
+   * the carry out into u_10 is at most 1280.  Since u_10 < 2^11 prior to
+   * this carry in, we now have u_10 < 2^11 + 1280 < 2^12; so the carry out
+   * into u_11 is at most 1.  The final reduction therefore only needs a
+   * conditional subtraction.
    */
-			   {		   c = u[10] >> 11; u[10] &= M11; }
+			   {		   c =  u[9] >> 11;  u[9] &= M11; }
+			   { u[10] +=   c; c = u[10] >> 11; u[10] &= M11; }
 			   { u[11] +=   c; c = u[11] >> 10; u[11] &= M10; }
 			   {  u[0] += 5*c; c =  u[0] >> 11;  u[0] &= M11; }
   for (i = 1; i <  5; i++) {  u[i] +=   c; c =  u[i] >> 11;  u[i] &= M11; }