git.distorted.org.uk Git - u/mdw/putty/blame_incremental

... / ...

Commit	Line	Data
	1	/*
	2	* Elliptic curve arithmetic.
	3	*/
	4
	5	#include "bn-internal.h"
	6	#include "ssh.h"
	7
	8	/* Some background.
	9	*
	10	* These notes are intended to help programmers who are mathematically
	11	* inclined, but not well versed in the theory of elliptic curves. If more
	12	* detail is required, introductory texts of various levels of sophistication
	13	* are readily available. I found
	14	*
	15	* L. Washington, `Elliptic Curves: Number Theory and Cryptography', CRC
	16	* Press, 2003
	17	*
	18	* useful. Further background on algebra and number theory can be found in
	19	*
	20	* V. Shoup, `A Computational Introduction to Number Theory and Algebra',
	21	* version 2, Cambridge University Press, 2008; http://shoup.net/ntb/
	22	* [CC-ND-NC].
	23	*
	24	* We work in a finite field k = GF(p^m). To make things easier, we'll only
	25	* deal with the cases where p = 2 (`binary' fields) or m = 1 (`prime'
	26	* fields). In fact, we don't currently implement binary fields either.
	27	*
	28	* We start in two-dimensional projective space over k, denoted P^2(k). This
	29	* is the space of classes of triples of elements of k { (a x : a y : a z) \|
	30	* a in k^* } such that x y z /= 0. Projective points with z /= 0 are called
	31	* `finite', and correspond to points in the more familiar two-dimensional
	32	* affine space (x/z, y/z); points with z = 0 are called `infinite'.
	33	*
	34	* An elliptic curve is the set of projective points satisfying an equation
	35	* of the form
	36	*
	37	* y^2 z + a_1 x y z + a_3 y z^2 = x^3 + a_2 x^2 z + a_4 x z^2 + a_6 z^3
	38	*
	39	* (the `Weierstrass form'), such that the partial derivatives don't both
	40	* vanish at any point. (Notice that this is a homogeneous equation -- all
	41	* of the monomials have the same degree -- so this is a well-defined subset
	42	* of our projective space. Don't worry about the numbering of the
	43	* coefficients: maybe the later stuff about Jacobian coordinates will make
	44	* them clear.)
	45	*
	46	* Let's consider, for a moment, a straight line r x + s y + t z = 0 which
	47	* meets the curve at two points P and Q; then they must also meet again at
	48	* some other point R. (Here, any or all of P, Q, and R might be equal;
	49	* in this case, the simultaneous equations have repeated solutions.) Let's
	50	* go out on a limb and write
	51	*
	52	* P + Q + R = 0
	53	*
	54	* whenever three points P, Q, and R on the curve are collinear. Obviously
	55	* this notion of `addition' is commutative: three points are collinear
	56	* whichever order we write them. If we designate some point O as being an
	57	* `identity' then we can introduce a notion of inverses. And then some
	58	* magic occurs: this `addition' operation we've just invented is also
	59	* associative -- so it's an abelian group operation! (This can be
	60	* demonstrated by slogging through the algebra, but there's a fancy way
	61	* involving the Weil pairing. Washington's book has both versions.)
	62	*
	63	* So we have an abelian group. It's actually quite a fun group, for
	64	* complicated reasons I'm not going to do into here. For our purposes it's
	65	* sufficient to note that, in most elliptic curve groups, the discrete
	66	* logarithm problem -- finding x given x P -- seems really very hard. I
	67	* mean, much harder than in multiplicative subgroups of finite fields,
	68	* where index calculus techniques can give you the answer in subexponential
	69	* time. The best algorithms known for solving this problem in elliptic
	70	* curve groups are the ones which work in any old group G -- and they run in
	71	* O(sqrt(#G)) time.
	72	*
	73	* Staring at the Weierstrass equation some more, we can determine that in
	74	* fact there's exactly one infinite point on the curve: if z = 0 then we
	75	* must have x = 0, so (0 : 1 : 0) is precisely that point. That's kind of
	76	* handy.
	77	*
	78	* The Weierstrass form is pretty horrid, really. Fortunately, it's always
	79	* possible to apply a linear transformation to make it less nasty. If p > 3
	80	* (which it will be for the prime fields we care about), we can simplify
	81	* this to
	82	*
	83	* y^2 z = x^3 + a x z^2 + b z^3 (with 4 a^3 + 27 b^2 /= 0)
	84	*
	85	* and in the binary case we can simplify to
	86	*
	87	* y^2 z + x y z = x^3 + a x^2 z + b z^3 (with b /= 0)
	88	*
	89	* and it's these kinds of forms which we're actually going to work with.
	90	* (There are other forms of elliptic curves which have interesting benefits,
	91	* e.g., the Montgomery and Edwards forms, but they've not really caught on
	92	* in standards.) Linear transformations don't mess up the geometry, so the
	93	* group we constructed above still works -- and, indeed, the transformation
	94	* is an isomorphism of groups. And, finally, the point at infinity we found
	95	* above is still infinite. Let's call that point O, and declare it to be
	96	* our additive identity. That leaves the finite points (x : y : 1), which
	97	* correspond to the affine points
	98	*
	99	* y^2 = x^3 + a x + b or y^2 + x y = x^3 + a x^2 + b
	100	*
	101	* So, if we want to add two points P and Q, we draw the line between them;
	102	* if P = Q then the line in question is the tangent to the curve at P. (The
	103	* smoothness condition above is precisely what we need for this tangent to
	104	* be well-defined. Unsmooth curves aren't worth worrying about, because
	105	* they're cryptographically useless: they're isomorphic to some other group
	106	* in which discrete logs are very easy.) In prime fields, if this line is
	107	* vertical then it meets the curve again at O, and P + Q = O; otherwise it
	108	* meets the curve at some other finite point R = (x, y), and the answer we
	109	* want is P + Q = -R = (x, -y). In binary fields, the situation is similar
	110	* but a little messier because of the x y term we couldn't eliminate.
	111	*
	112	* It's not very hard to work out where this point R is. You get a cubic
	113	* equation out the far end which looks kind of daunting until you notice
	114	* that the negation of the x^2 coefficient is exactly the sum of the roots,
	115	* and you already know two of them. Unfortunately, the solution is still
	116	* somewhat unpleasant and involves a bunch of division -- and in finite
	117	* fields that involves computing a modular inverse, which is rather slow.
	118	*
	119	* Wouldn't it be really nice if we could somehow keep the denominators
	120	* separate, so we could just do a few `big' divisions at the end of some
	121	* calculation? Of course, we'd need to tweak the point-addition formulae so
	122	* that we could cope with input points which have separate denominators.
	123	* And so we enter the realm of fun coordinate systems.
	124	*
	125	* Perhaps an obvious choice of coordinates is the projective form above:
	126	* rather than working out (x/m, y/n), we could just keep track of a
	127	* projective point (x : y : m n). It turns out that there's a slightly
	128	* different variant which is a little better: `Jacobian coordinates' are
	129	* classes of triples (x :: y :: z), where x y z /= 0, and the point (x :: y
	130	* :: z) is equivalent to (a^2 x :: a^3 y :: a z).
	131	*
	132	* The (simplified) Wieierstrass equation, using Jacobian projective
	133	* coordinates, looks like this.
	134	*
	135	* y^2 = x^3 + a x z^4 + b z^6
	136	*
	137	* or
	138	*
	139	* y^2 + x y z = x^3 + a x^2 z^2 + b z^6
	140	*
	141	* (Maybe this is a hint about the strange coefficient numbering in the
	142	* general Weierstrass form.) Anyway, if a point is infinite then we have
	143	* y^2 = x^3, which has the obvious solution x = y = 1.
	144	*
	145	* That's about all the theory we need for now. Let's get into the code.
	146	*/
	147
	148	/* Let's start by making an abstract interface for finite field arithmetic.
	149	* If we need to do clever things later then this will be a convenient
	150	* extension point (e.g., supporting binary fields, working in extension
	151	* fields as we'd need if we wanted to implement Tate or Weil pairings, or
	152	* just doing different kinds of optimizations for the prime case).
	153	*
	154	* The conversions between field elements and bignums deserve more
	155	* explanation. They are intended to align with the IEEE 1363 FE2OSP/OS2FEP
	156	* conversions, so that a big-endian radix-256 representation of the integer
	157	* (padded with leading zeroes as necessary) is precisely the octet-string
	158	* representation of the field element. For prime fields GF(p) == Z/pZ, a
	159	* field element is a residue class, and the appropriate integer is simply
	160	* the canonical class representative, i.e., the unique element n of the
	161	* class such that 0 <= n < p.
	162	*
	163	* Now consider an extension field GF(q^m) over GF(q). This can be viewed as
	164	* a GF(q)-vector space, and there are a number of choices of basis. A
	165	* popular choice is a polynomial basis: we represent GF(q^m) as
	166	* GF(q)[t]/(p(t)) for a degree-m polynomial irreducible over GF(q), and then
	167	* take (1 t ... t^{m-1}) as a basis; but Gaussian bases, of the form
	168	* (b^{q^{m-1}} ... b^q b) are also popular. Anyway, we start with a vector
	169	* (u_0 u_1 ... u_{m-1}) of GF(q) elements which (inductively) we can
	170	* convert to integers n_i with 0 <= n_i < q; then we can use the obvious
	171	* arithmetic code to map our element of GF(q^m) to u_0 + u_1 q + ... +
	172	* u_{m-1} q^{m-1}.
	173	*
	174	* This is quite cheesy, but it's enough for now.
	175	*/
	176
	177	union fieldelt {
	178	Bignum n;
	179	};
	180
	181	struct field {
	182	const struct field_ops ops; / Field operations table */
	183	union fieldelt zero, one; /* Identity elements */
	184	Bignum q; /* The field size */
	185	size_t eltbits, eltsz; /* Bits/octets for a field element */
	186	};
	187
	188	struct field_ops {
	189	void (destroy)(struct field f); /* Destroy the field object */
	190
	191	void (init)(struct field f, union fieldelt *x);
	192	/* Initialize x (to null) */
	193
	194	void (free)(struct field f, union fieldelt *x);
	195	/* Free an element x, leaving it
	196	* null
	197	*/
	198
	199	void (copy)(struct field f, union fieldelt d, union fieldelt x);
	200	/* Make d be a copy of x */
	201
	202	void (frombn)(struct field f, union fieldelt *d, Bignum n);
	203	/* Convert n to an element */
	204
	205	Bignum (tobn)(struct field f, union fieldelt *x);
	206	/* Convert x to an integer */
	207
	208	int (zerop)(struct field f, union fieldelt *x);
	209	/* Return nonzero if x is zero */
	210
	211	void (dbl)(struct field f, union fieldelt d, union fieldelt x);
	212	/* Store 2 x in d */
	213
	214	void (add)(struct field f, union fieldelt *d,
	215	union fieldelt x, union fieldelt y);
	216	/* Store x + y in d */
	217
	218	void (sub)(struct field f, union fieldelt *d,
	219	union fieldelt x, union fieldelt y);
	220	/* Store x - y in d */
	221
	222	void (mul)(struct field f, union fieldelt *d,
	223	union fieldelt x, union fieldelt y);
	224	/* Store x y in d */
	225
	226	void (sqr)(struct field f, union fieldelt d, union fieldelt x);
	227	/* Store x^2 in d */
	228
	229	void (inv)(struct field f, union fieldelt d, union fieldelt x);
	230	/* Store 1/x in d; it is an error
	231	* for x to be zero
	232	*/
	233
	234	int (sqrt)(struct field f, union fieldelt d, union fieldelt x);
	235	/* If x is a square, store a square
	236	* root (chosen arbitrarily) in d
	237	* and return nonzero; otherwise
	238	* return zero leaving d unchanged.
	239	*/
	240
	241	/* For internal use only. The standard methods call on these for
	242	* detailed field-specific behaviour.
	243	*/
	244	void (_reduce)(struct field f, union fieldelt *x);
	245	/* Reduce an intermediate result so
	246	* that it's within whatever bounds
	247	* are appropriate.
	248	*/
	249
	250	/* These functions are only important for prime fields. */
	251	void (dbl)(struct field f, union fieldelt d, union fieldelt x);
	252	/* Store 2 x in d */
	253
	254	void (tpl)(struct field f, union fieldelt d, union fieldelt x);
	255	/* Store 3 x in d */
	256
	257	void (dql)(struct field f, union fieldelt d, union fieldelt x);
	258	/* Store 4 x in d */
	259
	260	void (hlv)(struct field f, union fieldelt d, union fieldelt x);
	261	/* Store x/2 in d */
	262	};
	263
	264	/* Some standard methods for prime fields. */
	265
	266	static void gen_destroy(struct field *f)
	267	{ sfree(f); }
	268
	269	static void prime_init(struct field f, union fieldelt x)
	270	{ x->n = 0; }
	271
	272	static void prime_free(struct field f, union fieldelt x)
	273	{ if (x->n) { freebn(x->n); x->n = 0; } }
	274
	275	static void prime_copy(struct field f, union fieldelt d, union fieldelt *x)
	276	{ if (d != x) { f->ops->free(f, d); if (x->n) d->n = copybn(x->n); } }
	277
	278	static void prime_frombn(struct field f, union fieldelt d, Bignum n)
	279	{ f->ops->free(f, d); d->n = copybn(n); }
	280
	281	static Bignum prime_tobn(struct field f, union fieldelt x)
	282	{ return copybn(x->n); }
	283
	284	static Bignum prime_zerop(struct field f, union fieldelt x)
	285	{ return bignum_cmp(x->n, f->zero.n) == 0; }
	286
	287	static void prime_add(struct field f, union fieldelt d,
	288	union fieldelt x, union fieldelt y)
	289	{
	290	Bignum sum = bigadd(x, y);
	291	Bignum red = bigsub(sum, f->q);
	292
	293	f->ops->free(f, d);
	294	if (!red)
	295	d->n = sum;
	296	else {
	297	freebn(sum);
	298	d->n = red;
	299	}
	300	}
	301
	302	static void prime_sub(struct field f, union fieldelt d,
	303	union fieldelt x, union fieldelt y)
	304	{
	305	Bignum raise = bigadd(x, f->q);
	306	Bignum diff = bigsub(raise, y);
	307	Bignum drop = bigsub(diff, f->q);
	308
	309	f->ops->free(f, d); freebn(raise);
	310	if (!drop)
	311	d->n = diff;
	312	else {
	313	freebn(diff);
	314	d->n = drop;
	315	}
	316	}
	317
	318	static void prime_mul(struct field f, union fieldelt d,
	319	union fieldelt x, union fieldelt y)
	320	{
	321	Bignum prod = bigmul(x, y);
	322
	323	f->ops->free(f, d);
	324	d->n = prod;
	325	f->ops->_reduce(f, d);
	326	}
	327
	328	static void gen_sqr(struct field f, union fieldelt d, union fieldelt *x)
	329	{ f->ops->mul(f->ops, d, x, x); }
	330
	331	static void prime_inv(struct field f, union fieldelt d, union fieldelt *x)
	332	{
	333	Bignum inv = modinv(x->n, f->q);
	334
	335	f->ops->free(f, d);
	336	d->n = inv;
	337	}
	338
	339	static int prime_sqrt(struct field f, union fieldelt d, union fieldelt *x)
	340	{
	341	Bignum sqrt = modsqrt(x->n, f->q);
	342
	343	if (!sqrt) return 0;
	344	f->ops->free(f, d);
	345	d->n = sqrt;
	346	return 1;
	347	}
	348
	349	static void prime_dbl(struct field f, union_fieldelt d, union fieldelt *x)
	350	{ f->ops->add(f, d, x, x); }
	351
	352	static void prime_tpl(struct field f, union_fieldelt d, union fieldelt *x)
	353	{
	354	union_fieldelt t;
	355
	356	if (d != x) {
	357	f->ops->add(f, d, x, x);
	358	f->ops->add(f, d, d, x);
	359	} else {
	360	f->ops->init(f, &t);
	361	f->ops->copy(f, d, x);
	362	f->ops->add(f, d, &t, &t);
	363	f->ops->add(f, d, d, &t);
	364	f->ops->free(f, &t);
	365	}
	366	}
	367
	368	static void prime_qdl(struct field f, union fieldelt d,
	369	union fieldelt *x)
	370	{ f->ops->add(f, d, x, x); f->ops->add(f, d, d, d); }
	371
	372	static void prime_hlv(struct field f, union fieldelt d, union fieldelt *x)
	373	{
	374	Bignum t, u;
	375
	376	if (!x->n[0])
	377	f->ops->copy(f, d, x);
	378	else {
	379	/* The tedious answer is to multiply by the inverse of 2 in this
	380	* field. But there's a better way: either x or q - x is actually
	381	* divisible by 2, so the answer we want is either x >> 1 or p -
	382	* ((p - x) >> 1) depending on whether x is even.
	383	*/
	384	if (!(x->n[1] & 1)) {
	385	t = copybn(x->n);
	386	shift_right(t + 1, t[0], 1);
	387	} else {
	388	u = bigsub(f->q, x);
	389	shift_right(u + 1, u[0], 1);
	390	t = bigsub(f->q, u);
	391	freebn(u);
	392	}
	393	f->ops->free(f, d);
	394	d->n = t;
	395	}
	396
	397	/* Now some utilities for modular reduction. All of the primes we're
	398	* concerned about are of the form p = 2^n - d for `convenient' values of d
	399	* -- such numbers are called `pseudo-Mersenne primes'. Suppose we're given
	400	* a value x = x_0 2^n + x_1: then x - x_0 p = x_1 + x_0 d is clearly less
	401	* than x and congruent to it mod p. So we can reduce x below 2^n by
	402	* iterating this trick; if we're unlucky enough to have p <= x < 2^n then we
	403	* can just subtract p.
	404	*
	405	* The values of d we'll be dealing with are specifically convenient because
	406	* they satisfy the following properties.
	407	*
	408	* * We can write d = SUM_i d_i 2^i, with d_i in { -1, 0, +1 }. (This is
	409	* no surprise.)
	410	*
	411	* * d > 0.
	412	*
	413	* * Very few d_i /= 0. With one small exception, if d_i /= 0 then i is
	414	* divisible by 32.
	415	*
	416	* The following functions will therefore come in very handy.
	417	*/
	418
	419	static void add_imm_lsl(BignumInt *x, BignumInt y, unsigned shift)
	420	{
	421	BignumDbl c = y << (shift % BIGNUM_INT_BITS);
	422
	423	for (x += shift/BIGNUM_INT_BITS; c; x++) {
	424	c += *x;
	425	*x = c & BIGNUM_INT_MASK;
	426	c >>= BIGNUM_INT_BITS;
	427	}
	428	}
	429
	430	static void sub_imm_lsl(BignumInt *x, BignumInt y, unsigned shift)
	431	{
	432	BignumDbl c = y << (shift % BIGNUM_INT_BITS);
	433	BignumDbl t;
	434
	435	c--;
	436	for (x += shift/BIGNUM_INT_BITS; c; x++) {
	437	c ^= BIGNUM_INT_MASK;
	438	c += *x;
	439	*x = c & BIGNUM_INT_MASK;
	440	c >>= BIGNUM_INT_BITS;
	441	}
	442	}
	443
	444	static void pmp_reduce(struct field f, union fieldelt xx,
	445	void (add_mul_d_lsl)(BignumInt x,
	446	BignumInt y,
	447	unsigned shift))
	448	{
	449	/* Notation: write w = BIGNUM_INT_WORDS, and B = 2^w is the base we're
	450	* working in. We assume that p = 2^n - d for some d. The job of
	451	* add_mul_d_lsl is to add to its argument x the value y d 2^shift.
	452	*/
	453
	454	BignumInt top, p, x;
	455	unsigned plen, xlen, i;
	456	unsigned topbits, topclear;
	457	Bignum t;
	458
	459	/* Initialization: if x is shorter than p then there's nothing to do.
	460	* Otherwise, ensure that it's at least one word longer, since this is
	461	* required by the utility functions which add_mul_d_lsl will call.
	462	*/
	463	p = f->q + 1; plen = f->q[0];
	464	x = xx->n + 1; xlen = xx->n[0];
	465	if (xlen < plen) return;
	466	else if (xlen == plen) {
	467	t = newbn(plen + 1);
	468	for (i = 0; i < plen; i++) t[i + 1] = x[i];
	469	t[plen] = 0;
	470	freebn(xx->n); xx->n = t;
	471	x = t + 1; xlen = plen + 1;
	472	}
	473
	474	/* Preparation: Work out the bit length of p. At the end of this, we'll
	475	* have topbits such that n = w (plen - 1) + topbits, and topclear = w -
	476	* topbits, so that n = w plen - topclear.
	477	*/
	478	while (plen > 0 && !p[plen - 1]) plen--;
	479	top = p[plen - 1];
	480	assert(top);
	481	if (top & BIGNUM_TOP_BIT) topbits = BIGNUM_INT_BITS;
	482	else for (topbits = 0; top >> topbits; topbits++);
	483	topclear = BIGNUM_INT_BITS - topbits;
	484
	485	/* Step 1: Trim x down to the right number of words. */
	486	while (xlen > plen) {
	487
	488	/* Pick out the topmost word; call it y for now. */
	489	y = x[xlen - 1];
	490	if (!y) {
	491	xlen--;
	492	continue;
	493	}
	494
	495	/* Observe that d == 2^n (mod p). Clear that top word. This
	496	* effectively subtracts y B^{xlen-1} from x, so we must add back d
	497	* 2^{w(xlen-1)-n} in order to preserve congruence modulo p. Since d
	498	* is smaller than 2^n, this will reduce the absolute value of x, so
	499	* we make (at least some) progress. In practice, we expect that d
	500	* is a lot smaller.
	501	*
	502	* Note that w (xlen - 1) - n = w xlen - w - n = w (xlen - plen - 1)
	503	* + topclear.
	504	*/
	505	x[xlen - 1] = 0;
	506	add_mul_d_lsl(x + xlen - plen - 1, y, topclear);
	507	}
	508
	509	/* Step 2: Trim off the high bits of x, beyond the top bit of p. In more
	510	* detail: if x >= 2^n > p, then write x = x_0 + 2^n y; then we can clear
	511	* the top topclear bits of x, and add y d to preserve congruence mod p.
	512	*
	513	* If topclear is zero then there's nothing to do here: step 1 will
	514	* already have arranged that x < 2^n.
	515	*/
	516	if (topclear) {
	517	for (;;) {
	518	y = x[xlen - 1] >> topbits;
	519	if (!y) break;
	520	x[xlen - 1] &= (1 << topbits) - 1;
	521	add_mul_d_lsl(x, y, 0);
	522	}
	523	}
	524
	525	/* Step 3: If x >= p then subtract p from x. */
	526	for (i = plen; i-- && x[i] == p[i]; );
	527	if (i >= plen \|\| x[i] > p[i]) internal_sub(x, p, plen);
	528
	529	/* We're done now. */
	530	d->n[0] = xlen;
	531	bn_restore_invariant(d->n);
	532	}
	533
	534	/* Conversion operations for field elements.
	535	*
	536	* These are from IEEE 1363, among other standards. The format is absurdly
	537	* simple: its just a sequence of octets of the length necessary to represent
	538	* any element. For prime fields GF(p), this is just the big-endian
	539	* radix-256 encoding of the canonical residue-class representative (i.e.,
	540	* the unique element n of the residue class such that 0 <= n < p). Other
	541	* kinds of fields work differently, but the necessary conversions are
	542	* actually done by the tobn/frombn methods, so see the commentary there.
	543	*/
	544
	545	/* Convert an octet string to an element of field f, storing it i in x. The
	546	* octet string is at pp, and is at least nn octets long (one octet per
	547	* unsigned-char element). On success, zero is returned, and pp and nn are
	548	* updated to refer to the unconsumed piece of the input string. On failure,
	549	* -1 is returned, and pp and nn are left with unhelpful values.
	550	*
	551	* Failures can occur for several reasons (e.g., input overruns, or the input
	552	* being out of range); they are not distinguished here.
	553	*/
	554	int os2fep(field f, union fieldelt x, unsigned char *pp, long nn)
	555	{
	556	Bignum xn = 0;
	557	size_t n = f->eltsz;
	558	int rc = -1;
	559
	560	if (*nn < n) goto done;
	561
	562	xn = bignum_from_bytes(*pp, n);
	563	*pp = p + n;
	564	*nn -= n;
	565	if (bignum_cmp(xn, f->q) >= 0) goto done;
	566	f->ops->frombn(f, x, xn);
	567	done:
	568	if (xn) freebn(xn);
	569	return rc;
	570	}
	571
	572	/* Convert a element x of field f to an octet string. The string is written,
	573	* one octet per unsigned-char element, to the buffer at *pp, which is
	574	* assumed to have space for *nn elements. On success, zero is returned, and
	575	* pp and nn are updated to reflect the remaining space in the input
	576	* buffer. On failure, -1 is returned, but the buffer and pp and nn may be
	577	* partially updated.
	578	*
	579	* If you want this function to succeed, you must ensure that there are at
	580	* least f->eltsz bytes free in the output buffer.
	581	*/
	582	int fe2osp(field f, union fieldelt x, unsigned char *pp, long nn)
	583	{
	584	Bignum xn = 0;
	585	size_t n = f->eltsz, i;
	586	unsigned char p = pp;
	587	int rc = -1;
	588
	589	if (*nn < n) goto done;
	590
	591	xn = f->ops->tobn(f, x);
	592	for (i = 0; i < n; i++) p[i] = bignum_byte(xn, n - i - 1);
	593	*pp = p + n;
	594	*nn -= n;
	595	rc = 0;
	596	done:
	597	if (xn) freebn(xn);
	598	return rc;
	599	}
	600
	601	/* Putting together a field-ops structure for a pseudo-Mersenne prime field.
	602	*/
	603
	604	static void pmp_destroy(struct field *f)
	605	{ freebn(f->q); sfree(f); }
	606
	607	static void genfield_setup(struct field *f)
	608	{
	609	Bignum qm1 = bigsub(f->q, One);
	610	f->eltbits = bignum_bitcount(qm1);
	611	f->eltsz = (f->eltbits + 7)/8;
	612	freebn(qm1);
	613	}
	614
	615	static struct field make_pmp_field(const struct field_ops ops,
	616	const unsigned char *p, size_t plen)
	617	{
	618	struct field *f = snew(struct field);
	619
	620	f->ops = ops;
	621	f->zero.n = Zero;
	622	f->one.n = One;
	623	f->q = bignum_from_bytes(p, plen);
	624	genfield_setup(f);
	625
	626	return f;
	627	}
	628
	629	#define DEFINE_PMP_FIELD(fname) \
	630	\
	631	static const struct field_ops fname##_fieldops = { \
	632	pmp_destroy, \
	633	prime_init, prime_free, prime_copy, \
	634	prime_frombn, prime_tobn, \
	635	prime_zerop, \
	636	prime_add, prime_sub, prime_mul, gen_sqr, prime_inv, prime_sqrt, \
	637	fname##_reduce, \
	638	gen_dbl, gen_tpl, gen_qdl, \
	639	prime_hlv \
	640	}; \
	641	\
	642	struct field *make_##fname##_field(void) \
	643	{ \
	644	return make_pmp_field(&fname##_fieldops, \
	645	fname##_p, sizeof(fname##_p)); \
	646	}
	647
	648	/* Some actual field definitions. */
	649
	650	static const unsigned char p256_p[] = {
	651	0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x01,
	652	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	653	0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
	654	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
	655	};
	656
	657	static void p256_add_mul_d_lsl(BignumInt *x, BignumInt y, unsigned shift)
	658	{
	659	/* p_{256} = 2^{256} - 2^{224} + 2^{192} + 2^{96} - 1 */
	660	add_imm_lsl(x, y, shift + 224);
	661	sub_imm_lsl(x, y, shift + 192);
	662	sub_imm_lsl(x, y, shift + 96);
	663	add_imm_lsl(x, y, shift + 0);
	664	}
	665
	666	static void p256_reduce(struct field f, union fieldelt d)
	667	{ pmp_reduce(f, d, p256_add_mul_d_lsl); }
	668
	669	DEFINE_PMP_FIELD(p256)
	670
	671	static const unsigned char p384_p[] = {
	672	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
	673	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
	674	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
	675	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xfe,
	676	0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
	677	0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff
	678	};
	679
	680	static void p384_add_mul_d_lsl(BignumInt *x, BignumInt y, unsigned shift)
	681	{
	682	/* p_{384} = 2^{384} - 2^{128} - 2^{96} + 2^{32} - 1 */
	683	add_imm_lsl(x, y, shift + 128);
	684	add_imm_lsl(x, y, shift + 96);
	685	sub_imm_lsl(x, y, shift + 32);
	686	add_imm_lsl(x, y, shift + 0);
	687	}
	688
	689	static void p384_reduce(struct field f, union fieldelt d)
	690	{ pmp_reduce(f, d, p384_add_mul_d_lsl); }
	691
	692	DEFINE_PMP_FIELD(p384)
	693
	694	static const unsigned char p521_p[] = {
	695	0x01, 0xff,
	696	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
	697	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
	698	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
	699	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
	700	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
	701	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
	702	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
	703	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
	704	};
	705
	706	static void p521_add_mul_d_lsl(BignumInt *x, BignumInt y, unsigned shift)
	707	{ add_imm_lsl(x, y, shift); } /* p_{521} = 2^{521} - 1 */
	708
	709	static void p521_reduce(struct field f, union fieldelt d)
	710	{ pmp_reduce(f, d, p521_add_mul_d_lsl); }
	711
	712	DEFINE_PMP_FIELD(p521)
	713
	714	/* And now, some actual elliptic curves.
	715	*
	716	* As with fields, we start by defining an abstract interface which may have
	717	* various implementations (e.g., different algorithms with different
	718	* performance properties, or based on different kinds of underlying fields.
	719	*/
	720
	721	/* An external-form elliptic curve point. */
	722	struct ecpt {
	723	unsigned f; /* Flags */
	724	#define ECPTF_INF 1u /* Point is at infinity */
	725	union fieldelt x, y; /* x- and y-coordinates */
	726	};
	727
	728	/* An internal-form elliptic curve point. This is where most of the action
	729	* happens.
	730	*/
	731	union iecpt {
	732	struct {
	733	union fieldelt x, y, z; /* Projective-form triple */
	734	} jac; /* Jacobian projective coords */
	735	};
	736
	737	struct ec {
	738	const struct ec_ops ops; / Curve operations table */
	739	struct field f; / The underlying finite field */
	740	struct iecpt p; /* The well-known generator point */
	741	Bignum r; /* Size of the generated group */
	742	unsigned flags; /* Various flags */
	743	#define ECF_MINUS3 256u /* Prime Jacobi: a = -3 */
	744	union {
	745	struct { union fieldelt a, b; } w; /* Simple Weierstrass coeffs */
	746	} u;
	747	};
	748
	749	struct ec_ops {
	750	void (destroy)(struct ec ec); /* Destroy the curve itself */
	751
	752	void (init)(struct ec ec, union iecpt *p);
	753	/* Initialize p (to null) */
	754
	755	void (free)(struct ec ec, union iecpt *p);
	756	/* Free the point p */
	757
	758	void (setinf)(struct ec ec, union iecpt *p);
	759	/* Set p to be the point at infinity
	760	*/
	761
	762	int (infp)(struct ec ec, union iecpt *p);
	763	/* Answer whether p is infinite */
	764
	765	void (copy)(struct ec ec, union iecpt d, union iecpt d);
	766	/* Make p be a copy of d */
	767
	768	void (in)(struct ec ec, union iecpt d, struct ecpt p);
	769	/* Convert external-format p to
	770	* internal-format d
	771	*/
	772
	773	void (out)(struct ec ec, struct ecpt d, union iecpt p);
	774	/* Convert internal-format p to
	775	* external-format d
	776	*/
	777
	778	void (add)(struct ec ec, union iecpt *d,
	779	union iecpt p, union iecpt q);
	780	/* Store in d the sum of the points p
	781	* and q
	782	*/
	783
	784	void (sub)(struct ec ec, union iecpt *d,
	785	union iecpt p, union iecpt q);
	786	/* Store in d the result of
	787	* subtracting q from p
	788	*/
	789
	790	void (dbl)(struct ec ec, union iecpt *d,
	791	union iecpt p, union iecpt q);
	792	/* Store in d the result of adding
	793	* point p to itself.
	794	*/
	795
	796	void (neg)(struct ec ec, union iecpt d, union iecpt p);
	797	/* Store in d the negation of p */
	798
	799
	800	int (oncurvep)(struct ec ec, struct ecpt *p);
	801	/* Is the point p actually on the
	802	* curve?
	803	*/
	804
	805	int (compressy)(struct ec ec, struct ecpt *p, unsigned pc);
	806	/* Return the compressed form of p
	807	* according to the flags set in
	808	* pc.
	809	*/
	810
	811	int (fromx)(struct ec ec, struct ecpt *p);
	812	/* Given x, find and store y so that
	813	* (x, y) is on the curve. If there
	814	* is no such y then return zero;
	815	* otherwise return one.
	816	*/
	817	};
	818
	819	/* Initialize the (external-format) point p to infinity. */
	820	void init_ecpt(struct ec ec, struct ecpt p)
	821	{
	822	struct field *f = ec->f;
	823
	824	p->f = ECPTF_INF;
	825	f->ops->init(f, &p->x);
	826	f->ops->init(f, &p->y);
	827	}
	828
	829	/* Free an external-format point. */
	830	void free_ecpt(struct ec ec, struct ecpt p)
	831	{
	832	struct field *f = ec->f;
	833
	834	p->f = ECPTF_INF;
	835	f->ops->free(f, &p->x);
	836	f->ops->free(f, &p->y);
	837	}
	838
	839	/* The EC2OSP format bits, according to IEEE 1363a-D12. The point at
	840	* infinity is represented by a byte with value zero; a finite point must
	841	* have at least one of COMPR and UNCOMPR set. Point compression works as
	842	* follows: for any x in F, there are at most two y values such that (x, y)
	843	* is on the curve. Therefore we can transmit y as a single bit if we have
	844	* some way of distinguishing the two.
	845	*
	846	* The SORT format is easiest. If there are two different y values, one of
	847	* them has a lexicographically smaller representation; so we can just
	848	* transmit that one as 0, and the other one as 1.
	849	*
	850	* The LSB format (implied if there are no other compression-format flags
	851	* set) is a little cleverer, and works differently according to the
	852	* characteristic of the underlying field. Let y be one of the possible
	853	* y-coordinates.
	854	*
	855	* For prime fields GF(p), the other y-coordinate is -y == p - y (mod p);
	856	* because p is odd, exactly one of these (well, its canonical residue, in
	857	* the interval [0, p)) will be even, so we can transmit y as its least
	858	* significant bit.
	859	*
	860	* For binary fields GF(2^m), the other y-coordinate is y' = y + x (observe
	861	* that (y + x)^2 + x (y + x) = y^2 + x^2 + x y + x^2 = y^2 + x y).
	862	* Therefore, there's a single y coordinate exactly[1] when x = 0; we'll
	863	* compress y to zero in this case. Otherwise, write z = y/x; then y = x z
	864	* and y' = x (z + 1). Obviously z and z + 1 differ in their least
	865	* significant bits, so encode y as the least significant bit of y/x.
	866	*
	867	* [1] Note that the squaring map x \|-> x^2 in a binary field is a
	868	* permutation. More generally, suppose F is a field with characteristic
	869	* p; then the Frobenius map x \|-> x^p is (binomial theorem) linear on
	870	* F. Suppose that x^p = y^p; then (x - y)^p = 0 by linearity. But if
	871	* x /= y then x - y is a zero-divisor, which is impossible.
	872	*/
	873	#define ECF_Y 1u /* The compressed y-coordinate */
	874	#define ECF_COMPR 2u /* Compressed y is present */
	875	#define ECF_UNCOMPR 4u /* Uncompressed y is present */
	876	#define ECF_SORT 8u /* SORT compression (not LSB) */
	877	#define ECF_RESV 0xf0 /* Reserved bits, must be zero */
	878
	879	/* And now some common utilities. */
	880
	881	static void weier_destroy(struct ec *ec)
	882	{
	883	ec->f->ops(ec->f, &ec->u.w.a);
	884	ec->f->ops(ec->f, &ec->u.w.b);
	885	freebn(ec->r);
	886	f->ops->destroy(ec->f);
	887	sfree(ec);
	888	}
	889
	890	static void jac_init(struct ec ec, union iecpt p)
	891	{
	892	ec->f->ops->init(ec->f, &p->jac.x);
	893	ec->f->ops->init(ec->f, &p->jac.y);
	894	ec->f->ops->init(ec->f, &p->jac.z);
	895	}
	896
	897	static void jac_free(struct ec ec, union iecpt p)
	898	{
	899	ec->f->ops->free(ec->f, &p->jac.x);
	900	ec->f->ops->free(ec->f, &p->jac.y);
	901	ec->f->ops->free(ec->f, &p->jac.z);
	902	}
	903
	904	static void jac_setinf(struct ec ec, union iecpt d)
	905	{
	906	ec->f->ops->copy(ec->f, &d->jac.x, &ec->f->one);
	907	ec->f->ops->copy(ec->f, &d->jac.y, &ec->f->one);
	908	ec->f->ops->copy(ec->f, &d->jac.z, &ec->f->zero);
	909	}
	910
	911	static int jac_infp(struct ec ec, union iecpt d)
	912	{ return ec->f->ops->zerop(ec->f, &d->jac.z); }
	913
	914	static void jac_copy(struct ec ec, union iecpt d, union iecpt *d)
	915	{
	916	ec->f->ops->copy(ec->f, &d->jac.x, &p->jac.x);
	917	ec->f->ops->copy(ec->f, &d->jac.y, &p->jac.y);
	918	ec->f->ops->copy(ec->f, &d->jac.z, &p->jac.z);
	919	}
	920
	921	static void jac_in(struct ec ec, union iecpt d, struct ecpt *p)
	922	{
	923	ec->f->ops->copy(ec->f, &d->x, &p->jac.x);
	924	ec->f->ops->copy(ec->f, &d->y, &p->jac.y);
	925	ec->f->ops->copy(ec->f, &d->z, &ec->f->one);
	926	}
	927
	928	static void jac_out(struct ec ec, struct ecpt d, union iecpt *p)
	929	{
	930	union fieldelt iz2, iz3;
	931
	932	if (ec->f->ops->zerop(ec->f, &p->jac.z)) {
	933	ec->f->ops->free(ec->f, d->x);
	934	ec->f->ops->free(ec->f, d->y);
	935	d->f = ECPTF_INF;
	936	} else {
	937	ec->f->ops->init(ec->f, &iz2);
	938	ec->f->ops->init(ec->f, &iz3);
	939
	940	ec->f->ops->inv(ec->f, &iz3, &p->jac.z);
	941	ec->f->ops->sqr(ec->f, &iz2, &iz3);
	942	ec->f->ops->mul(ec->f, &iz3, &iz3, &iz2);
	943
	944	ec->f->ops->mul(ec->f, &d->x, &p->jac.x, &iz2);
	945	ec->f->ops->mul(ec->f, &d->y, &p->jac.x, &iz3);
	946	d->f = 0;
	947
	948	ec->f->ops->free(ec->f, &iz2);
	949	ec->f->ops->free(ec->f, &iz3);
	950	}
	951	}
	952
	953	static void genec_sub(struct ec ec, union iecpt d,
	954	union iecpt p, union iecpt q)
	955	{
	956	union iecpt t;
	957
	958	ec->ops->init(ec, &t);
	959	ec->ops->neg(ec, &t, q);
	960	ec->ops->add(ec, d, p, &t);
	961	ec->ops->free(ec, &t);
	962	}
	963
	964	/* Finally, let's define some actual curve arithmetic functions. */
	965
	966	static void pwjac_dbl(struct ec ec, union iecpt d, union iecpt *p)
	967	{
	968	struct field *f = ec->f;
	969	union fieldelt m, s, t, u;
	970
	971	if (ec->ops->infp(ec, p)) {
	972	ec->setinf(ec, d);
	973	return;
	974	}
	975
	976	f->ops->init(f, &m);
	977	f->ops->init(f, &s);
	978	f->ops->init(f, &t);
	979	f->ops->init(f, &u);
	980
	981	f->ops->sqr(f, &u, &p->jac.z); /* z^2 */
	982	f->ops->sqr(f, &t, &u); /* z^4 */
	983	f->ops->sqr(f, &m, &p->jac.x); /* x^2 */
	984	if (ec->flags & ECF_MINUS3) {
	985	f->ops->sqr(f, &m, &m, &t); /* x^2 - z^4 */
	986	f->ops->tpl(f, &m, &m); /* m = 3 x^2 - 3 z^4 */
	987	} else {
	988	f->ops->mul(f, &u, &t, &ec->u.w.a); /* a z^4 */
	989	f->ops->tpl(f, &m, &m); /* 3 x^2 */
	990	f->ops->add(f, &m, &m, &u); /* m = 3 x^2 + a z^4 */
	991	}
	992
	993	f->ops->dbl(f, &t, &p->jac.y); /* 2 y */
	994	f->ops->mul(f, &d->jac.z, &t, &p->jac.z); /* z' = 2 y z */
	995
	996	f->ops->sqr(f, &u, &t); /* 4 y^2 */
	997	f->ops->mul(f, &s, &u, &p->jac.x); /* s = 4 x y^2 */
	998	f->ops->sqr(f, &t, &u); /* 16 y^4 */
	999	f->ops->hlv(f, &t, &t); /* t = 8 y^4 */
	1000
	1001	f->ops->dbl(f, &u, &s); /* 2 s */
	1002	f->ops->sqr(f, &d->jac.x, &m); /* m^2 */
	1003	f->ops->sub(f, &d->jac.x, &d->jac.x, &u); /* x' = m^2 - 2 s */
	1004
	1005	f->ops->sub(f, &u, &s, &d->jac.x); /* s - x' */
	1006	f->ops->mul(f, &d->jac.y, &m, &u); /* m (s - x') */
	1007	f->ops->sub(f, &d->jac.y, &d->jac.y, &t); /* y' = m (s - x') - t */
	1008
	1009	f->ops->free(f, &m);
	1010	f->ops->free(f, &s);
	1011	f->ops->free(f, &t);
	1012	f->ops->free(f, &u);
	1013	}
	1014
	1015	static void pwjac_add(struct ec ec, union iecpt d,
	1016	union iecpt p, union iecpt q)
	1017	{
	1018	struct field *f = ec->f;
	1019	union fieldelt m, r, s, ss, t, u, uu, w;
	1020
	1021	if (a == b) { pwjac_dbl(ec, d, p); return; }
	1022	else if (ec->ops->infp(ec, p)) { ec->ops->copy(ec, d, q); return; }
	1023	else if (ec->ops->infp(ec, q)) { ec->ops->copy(ec, d, p); return; }
	1024
	1025	f->ops->init(f, &m);
	1026	f->ops->init(f, &r);
	1027	f->ops->init(f, &s);
	1028	f->ops->init(f, &ss);
	1029	f->ops->init(f, &t);
	1030	f->ops->init(f, &u);
	1031	f->ops->init(f, &uu);
	1032	f->ops->init(f, &w);
	1033
	1034	f->ops->sqr(f, &s, &p->jac.z); /* z_0^2 */
	1035	f->ops->mul(f, &u, &s, &q->jac.x); /* u = x_1 z_0^2 */
	1036	f->ops->mul(f, &s, &s, &p->jac.z); /* z_0^3 */
	1037	f->ops->mul(f, &s, &s, &q->jac.y); /* s = y_1 z_0^3 */
	1038
	1039	f->ops->sqr(f, &ss, &q->jac.z); /* z_1^2 */
	1040	f->ops->mul(f, &uu, &ss, &p->jac.x); /* uu = x_0 z_1^2 */
	1041	f->ops->mul(f, &ss, &ss, &q->jac.z); /* z_1^3 */
	1042	f->ops->mul(f, &ss, &ss, &p->jac.y); /* ss = y_0 z_1^3 */
	1043
	1044	f->ops->sub(f, &w, &uu, &u); /* w = uu - u */
	1045	f->ops->sub(f, &r, &ss, &s); /* r = ss - s */
	1046	if (f->ops->zerop(f, w)) {
	1047	if (f->ops->zerop(f, r)) ec->ops->dbl(ec, d, p);
	1048	else ec->ops->setinf(ec, d);
	1049	goto cleanup;
	1050	}
	1051
	1052	f->ops->add(f, &t, &u, &uu); /* t = uu + u */
	1053	f->ops->add(f, &m, &s, &ss); /* m = ss + s */
	1054
	1055	f->ops->mul(f, &uu, &p->jac.z, &w); /* z_0 w */
	1056	f->ops->mul(f, &d->jac.z, &uu, &q->jac.z); /* z' = z_0 z_1 w */
	1057
	1058	f->ops->sqr(f, &s, &w); /* w^2 */
	1059	f->ops->mul(f, &u, &s, &t); /* t w^2 */
	1060	f->ops->mul(f, &s, &s, &w); /* w^3 */
	1061	f->ops->mul(f, &s, &s, &m); /* m w^3 */
	1062
	1063	f->ops->sqr(f, &m, &r); /* r^2 */
	1064	f->ops->sub(f, &d->jac.x, &m, &u); /* x' = r^2 - t w^2 */
	1065
	1066	f->ops->dbl(f, &m, &d->jac.x); /* 2 x' */
	1067	f->ops->sub(f, &u, &u, &m); /* v = t w^2 - 2 x' */
	1068	f->ops->mul(f, &u, &u, &r); /* v r */
	1069	f->ops->sub(f, &u, &u, &s); /* v r - m w^3 */
	1070	f->ops->hlv(f, &d->jac.y, &m); /* y' = (v r - m w^3)/2 */
	1071
	1072	cleanup:
	1073	f->ops->free(f, &m);
	1074	f->ops->free(f, &r);
	1075	f->ops->free(f, &s);
	1076	f->ops->free(f, &ss);
	1077	f->ops->free(f, &t);
	1078	f->ops->free(f, &u);
	1079	f->ops->free(f, &uu);
	1080	f->ops->free(f, &w);
	1081	}
	1082
	1083	static int pweier_oncurvep(struct ec ec, struct ecpt p)
	1084	{
	1085	struct field *f = ec->f;
	1086	union fieldelt a, t;
	1087	int rc;
	1088
	1089	if (p->f & ECPTF_INF) return 1;
	1090
	1091	f->ops->init(f, &a);
	1092	f->ops->init(f, &t);
	1093
	1094	f->ops->sqr(f, &a, &p->x); /* x^2 */
	1095	f->ops->mul(f, &a, &a, &p->x); /* x^3 */
	1096
	1097	f->ops->mul(f, &t, &p->x, &wc->u.w.a); /* a x */
	1098	f->ops->add(f, &a, &a, &t); /* x^3 + a x */
	1099
	1100	f->ops->add(f, &a, &a, &ec->u.w.b); /* x^3 + a x + b */
	1101
	1102	f->ops->sqr(f, &t, &p->y); /* y^2 */
	1103	f->ops->sub(f, &a, &a, &t); /* x^3 + a x + b - y^2 =? 0 */
	1104	rc = f->ops->zerop(f, &a);
	1105
	1106	f->ops->free(f, &a);
	1107	f->ops->free(f, &t);
	1108	return rc;
	1109	}
	1110
	1111	/* Curve-generic SORT-style point compression. */
	1112	static int genec_compressy_sort(struct ec ec, struct ecpt p)
	1113	{
	1114	struct field *f = ec->f;
	1115	struct ecpt mp;
	1116	union iecpt q;
	1117	Bignum y, yy;
	1118	int rc;
	1119
	1120	init_ecpt(ec, &mp);
	1121	ec->ops->init(ec, &q);
	1122
	1123	ec->ops->in(ec, &q, p);
	1124	ec->ops->neg(ec, &q, &q);
	1125	ec->ops->out(ec, &mp, &q);
	1126
	1127	y = f->ops->tobn(f, &p->y);
	1128	yy = f->ops->tobn(f, &mp.y);
	1129	rc = bignum_cmp(y, yy) > 0;
	1130
	1131	freebn(y); freebn(yy);
	1132	free_ecpt(ec, &mp);
	1133	ec->ops->free(ec, &q);
	1134
	1135	return rc;
	1136	}
	1137
	1138	static int prime_compressy(struct ec ec, struct ecpt p, unsigned pc)
	1139	{
	1140	Bignum y;
	1141	int rc;
	1142
	1143	if (pc & ECF_SORT)
	1144	rc = genec_compressy_sort(ec, p);
	1145	else {
	1146	y = ec->f->ops->tobn(ec->f, &p->y);
	1147	rc = (y[0] && (y[1] & 1));
	1148	freebn(y);
	1149	}
	1150	return rc;
	1151	}
	1152
	1153	static int pweier_fromx(struct ec ec, struct ecpt p)
	1154	{
	1155	struct field *f = ec->f;
	1156	union fieldelt a, t, u;
	1157	Bignum yn;
	1158	int rc;
	1159
	1160	f->ops->init(f, &a);
	1161	f->ops->init(f, &t);
	1162
	1163	f->ops->sqr(f, &a, &p->x); /* x^2 */
	1164	f->ops->mul(f, &a, &a, &p->x); /* x^3 */
	1165	f->ops->mul(f, &t, &ec->u.w.a, &p->x); /* a x */
	1166	f->ops->add(f, &a, &a, &t); /* x^3 + a x */
	1167	f->ops->mul(f, &a, &a, &ec->u.w.b); /* x^3 + a x + b */
	1168
	1169	rc = f->ops->sqrt(f, &p->y, &a);
	1170
	1171	f->ops->free(f, &a);
	1172	f->ops->free(f, &t);
	1173	return rc;
	1174	}
	1175
	1176	static const struct ec_ops pwjac_ecops = {
	1177	weier_destroy,
	1178	jac_init, jac_free, jac_setinf, jac_infp, jac_copy, jac_in, jac_out,
	1179	pwjac_add, genec_sub, pwjac_dbl, pwjac_neg,
	1180	pweier_oncurvep, prime_compressy, pweier_fromx
	1181	};
	1182
	1183	/* Conversions between elliptic-curve points and octet strings.
	1184	*
	1185	* The octet-string encoding works like this. The first octet is called PC
	1186	* and contains flags, listed above as Y, COMPR, UNCOMPR, and SORT. If all
	1187	* of the flags are clear, then (a) there are no further octets, and (b) the
	1188	* point is the point at infinity. Otherwise, there is an x-coordinate (in
	1189	* FE2OSP form); if UNCOMPR is set then there will be a y-coordinate (again
	1190	* in FE2OSP form). If COMPR is set then the y-coordinate is compressed into
	1191	* the Y flag (see above for the compression formats). It is acceptable that
	1192	* both COMPR and UNCOMPR are set: this is the `hybrid' encoding, which
	1193	* doesn't seem very useful but we support it anyway (checking that the
	1194	* explicitly provided y-coordinate really does match the compressed
	1195	* version).
	1196	*
	1197	* Finally, if both COMPR and UNCOMPR are clear, but Y is set, we're in the
	1198	* `x-only' case. Note that the two points with the same x-coordinate are
	1199	* additive inverses of each other, say Q and -Q; then n (-Q) = -(n Q), and
	1200	* so if we were only interested in the x-coordinate of n Q, then we only
	1201	* need to know the x-coordinate of Q. There are optimized ways of doing
	1202	* this computation, but we don't use those here. Instead, we just pick a
	1203	* y-coordinate arbitrarily and hope that's OK.
	1204	*/
	1205
	1206	/* Convert an octet string into an elliptic curve point. The input octet
	1207	* string starts at *pp and goes on for nn bytes. Return zero if all is
	1208	* good, and update pp and nn to refer to the unconsumed portion of the
	1209	* input; return -1 if the encoding is bad in some (unspecified) way (e.g.,
	1210	* the encoding is bad, the options provided don't make sense, the point
	1211	* isn't on the curve,x...). In the latter case, pp and nn may be left in
	1212	* some unhelpful state.
	1213	*/
	1214	int os2ecp(struct ec ec, union iecpt p, unsigned char *pp, long nn)
	1215	{
	1216	struct field *f = ec->f;
	1217	unsigned pc;
	1218	struct ecpt pe;
	1219	union fieldelt t;
	1220	union iecpt q;
	1221	int rc = -1;
	1222
	1223	/* Set things up. */
	1224	init_ecpt(ec, &pe);
	1225	f->ops->init(f, &t);
	1226	ec->ops->init(ec, &q);
	1227
	1228	/* Read the PC octet so that we can decode the rest. */
	1229	if (*nn < 1) goto done;
	1230	pc = (pp++); (*nn)--;
	1231
	1232	/* Decode the remaining data. */
	1233	if (pc & ECF_RESV)
	1234	goto done;
	1235	else if (!pc)
	1236	pe.f = ECPTF_INF;
	1237	else {
	1238	pe.f = 0;
	1239	if (os2fep(f, &pe.x, pp, nn)) goto done;
	1240
	1241	if (pc & ECF_UNCOMPR) {
	1242	if (os2fep(f, &pe.y, pp, nn)) goto done;
	1243	if (!ec->ops->oncurvep(ec, &pe)) goto done;
	1244	if (!(pc & ECF_COMPR)) {
	1245	if (pc & ~ECF_UNCOMPR) goto done;
	1246	} else {
	1247	if (ec->ops->compressy(ec, &pe) != (pc & ECF_Y)) goto done;
	1248	}
	1249	ec->ops->in(ec, p, &pe);
	1250	} else {
	1251	if (!ec->ops->fromx(ec, &pe)) goto done;
	1252	ec->ops->in(ec, p, &pe);
	1253	if (!(pc & ECF_COMPR)) {
	1254	if (pc != ECF_Y) goto done;
	1255	} else if (ec->ops->compressy(ec, &pe) != (pc & ECF_Y))
	1256	ec->ops->neg(ec, p, p);
	1257	}
	1258	}
	1259
	1260	rc = 0;
	1261
	1262	done:
	1263	f->ops->free(f, &t);
	1264	free_ecpt(ec, &pe);
	1265	ec->ops->free(ec, &q);
	1266	return rc;
	1267	}
	1268
	1269	/* Convert an elliptic curve point to external format. The point data is
	1270	* written to the buffer at pp, which must have space for at least nn
	1271	* bytes. On successful exit, pp and nn are updated to reflect the unused
	1272	* tail of the buffer, and zero is returned; on failure, -1 is returned, and
	1273	* the buffer may contain partial output.
	1274	*
	1275	* This will always succeed if the buffer is at least 2*ec->f->eltsz + 1
	1276	* bytes long.
	1277	*/
	1278	int ec2osp(struct ec ec, union iecpt p, unsigned pc,
	1279	unsigned char *pp, long nn)
	1280	{
	1281	struct field *f = ec->f;
	1282	unsigned pc;
	1283	struct ecpt pe;
	1284	union fieldelt t;
	1285	int rc = -1;
	1286
	1287	f->ops->init(f, &t);
	1288	init_ecpt(ec, &pe);
	1289	ec->ops->out(ec, &pe, p);
	1290
	1291	if (pe.f & ECPTF_INF)
	1292	pc = 0;
	1293	else if (pc & ECF_COMPR)
	1294	pc = (pc & ~ECF_Y) \| ec->ops->compressy(ec, &pe);
	1295
	1296	if (*nn < 1) goto done;
	1297	(pp++) = pc; (*nn)--;
	1298
	1299	if (pc && fe2osp(f, &pe.x, pp, nn)) goto done;
	1300	if ((pc & ECF_UNCOMPR) && fe2osp(f, &pe.y, pp, nn)) goto done;
	1301
	1302	ec->ops->in(ec, p, &pe);
	1303	rc = 0;
	1304
	1305	done:
	1306	f->ops->free(f, &t);
	1307	free_ecpt(ec, &pe);
	1308	return rc;
	1309	}
	1310
	1311	/* Putting together a predfined elliptic curve. */
	1312
	1313	static void init_fieldelt_from_bytes(struct field f, union fieldelt x,
	1314	unsigned char *p, size_t n)
	1315	{
	1316	Bignum n = bignum_from_bytes(p, n);
	1317	f->ops->init(f, x);
	1318	f->ops->frombn(f, x, n);
	1319	freebn(n);
	1320	}
	1321
	1322	static void init_ecpt_from_bytes(struct ec ec, union iecpt p,
	1323	unsigned char *px, size_t nx,
	1324	unsigned char *py, size_t ny)
	1325	{
	1326	struct ecpt pp;
	1327	struct field *f = ec->f;
	1328
	1329	pp.f = 0;
	1330	init_fieldelt_from_bytes(f, &pp.x, px, nx);
	1331	init_fieldelt_from_bytes(f, &pp.y, py, ny);
	1332	ec->ops->init(ec, p);
	1333	ec->ops->in(ec, p, &pp);
	1334	f->ops->free(f, &pp.x);
	1335	f->ops->free(f, &pp.y);
	1336	}
	1337
	1338	static struct ec make_pwjac_curve(struct field f,
	1339	const unsigned char *a, size_t alen,
	1340	const unsigned char *b, size_t blen,
	1341	const unsigned char *r, size_t rlen,
	1342	const unsigned char *p, size_t plen)
	1343	{
	1344	struct ec *ec = snew(struct ec);
	1345	union fieldelt x;
	1346	unsigned char *q;
	1347	int rc;
	1348	static BignumInt three[] = { 1, 3 };
	1349
	1350	ec->ops = &pwjac_ecops;
	1351	ec->f = f;
	1352	ec->flags = 0;
	1353
	1354	if (a)
	1355	init_fieldelt_from_bytes(ec->f, &ec->u.w.a, a, alen);
	1356	else {
	1357	f->ops->init(f, &ec->u.w.a);
	1358	f->ops->frombn(f, &ec->u.w.a, three);
	1359	f->ops->sub(f, &ec->u.w.a, &f->zero, &ec->u.w.a);
	1360	}
	1361
	1362	init_fieldelt_from_bytes(ec->f, &ec->u.w.b, b, blen);
	1363	ec->r = bignum_from_bytes(r, rlen);
	1364	q = (/unconst/ unsigned char *)p;
	1365	rc = os2ecp(ec, &ec->p, &q, &plen); assert(!rc); assert(!plen);
	1366
	1367	return rc;
	1368	}
	1369
	1370	#define PWJAC_MINUS3(ecname) 0, 0
	1371	#define PWJAC_STD(ecname) ecname##_a, sizeof(ecname##_a)
	1372
	1373	#define DEFINE_PWJAC_CURVE(ecname, field, kind) \
	1374	\
	1375	struct ec *make_##ecname##_curve(void) \
	1376	{ \
	1377	return make_pwjac_curve(make_##field##_field(), \
	1378	kind(ecname), \
	1379	ecname##_b, sizeof(ecname##_b), \
	1380	ecname##_r, sizeof(ecname##_r), \
	1381	ecname##_p, sizeof(ecname##_p)); \
	1382	}
	1383
	1384	/* Some well-known elliptic curves. */
	1385
	1386	static const unsigned char p256_b[] = {
	1387	0x5a, 0xc6, 0x35, 0xd8, 0xaa, 0x3a, 0x93, 0xe7,
	1388	0xb3, 0xeb, 0xbd, 0x55, 0x76, 0x98, 0x86, 0xbc,
	1389	0x65, 0x1d, 0x06, 0xb0, 0xcc, 0x53, 0xb0, 0xf6,
	1390	0x3b, 0xce, 0x3c, 0x3e, 0x27, 0xd2, 0x60, 0x4b
	1391	};
	1392
	1393	static void unsigned char p256_r[] = {
	1394	0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
	1395	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
	1396	0xbc, 0xe6, 0xfa, 0xad, 0xa7, 0x17, 0x9e, 0x84,
	1397	0xf3, 0xb9, 0xca, 0xc2, 0xfc, 0x63, 0x25, 0x51
	1398	};
	1399
	1400	static void unsigned char p256_p[] = {
	1401	ECF_COMPR \| ECF_Y,
	1402	0x6b, 0x17, 0xd1, 0xf2, 0xe1, 0x2c, 0x42, 0x47,
	1403	0xf8, 0xbc, 0xe6, 0xe5, 0x63, 0xa4, 0x40, 0xf2,
	1404	0x77, 0x03, 0x7d, 0x81, 0x2d, 0xeb, 0x33, 0xa0,
	1405	0xf4, 0xa1, 0x39, 0x45, 0xd8, 0x98, 0xc2, 0x96
	1406	};
	1407
	1408	DEFINE_PWJAC_CURVE(p256, p256, PWJAC_MINUS3)
	1409
	1410	static const unsigned char p384_b[] = {
	1411	0xb3, 0x31, 0x2f, 0xa7, 0xe2, 0x3e, 0xe7, 0xe4,
	1412	0x98, 0x8e, 0x05, 0x6b, 0xe3, 0xf8, 0x2d, 0x19,
	1413	0x18, 0x1d, 0x9c, 0x6e, 0xfe, 0x81, 0x41, 0x12,
	1414	0x03, 0x14, 0x08, 0x8f, 0x50, 0x13, 0x87, 0x5a,
	1415	0xc6, 0x56, 0x39, 0x8d, 0x8a, 0x2e, 0xd1, 0x9d,
	1416	0x2a, 0x85, 0xc8, 0xed, 0xd3, 0xec, 0x2a, 0xef
	1417	};
	1418
	1419	static const unsigned char p384_r[] = {
	1420	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
	1421	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
	1422	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
	1423	0xc7, 0x63, 0x4d, 0x81, 0xf4, 0x37, 0x2d, 0xdf,
	1424	0x58, 0x1a, 0x0d, 0xb2, 0x48, 0xb0, 0xa7, 0x7a,
	1425	0xec, 0xec, 0x19, 0x6a, 0xcc, 0xc5, 0x29, 0x73
	1426	};
	1427
	1428	static const unsigned char p384_p[] = {
	1429	ECF_COMPR \| ECF_Y,
	1430	0xaa, 0x87, 0xca, 0x22, 0xbe, 0x8b, 0x05, 0x37,
	1431	0x8e, 0xb1, 0xc7, 0x1e, 0xf3, 0x20, 0xad, 0x74,
	1432	0x6e, 0x1d, 0x3b, 0x62, 0x8b, 0xa7, 0x9b, 0x98,
	1433	0x59, 0xf7, 0x41, 0xe0, 0x82, 0x54, 0x2a, 0x38,
	1434	0x55, 0x02, 0xf2, 0x5d, 0xbf, 0x55, 0x29, 0x6c,
	1435	0x3a, 0x54, 0x5e, 0x38, 0x72, 0x76, 0x0a, 0xb7
	1436	};
	1437
	1438	DEFINE_PWJAC_CURVE(p384, p384, PWJAC_MINUS3)
	1439
	1440	static const unsigned char p521_b[] = {
	1441	0x51,
	1442	0x95, 0x3e, 0xb9, 0x61, 0x8e, 0x1c, 0x9a, 0x1f,
	1443	0x92, 0x9a, 0x21, 0xa0, 0xb6, 0x85, 0x40, 0xee,
	1444	0xa2, 0xda, 0x72, 0x5b, 0x99, 0xb3, 0x15, 0xf3,
	1445	0xb8, 0xb4, 0x89, 0x91, 0x8e, 0xf1, 0x09, 0xe1,
	1446	0x56, 0x19, 0x39, 0x51, 0xec, 0x7e, 0x93, 0x7b,
	1447	0x16, 0x52, 0xc0, 0xbd, 0x3b, 0xb1, 0xbf, 0x07,
	1448	0x35, 0x73, 0xdf, 0x88, 0x3d, 0x2c, 0x34, 0xf1,
	1449	0xef, 0x45, 0x1f, 0xd4, 0x6b, 0x50, 0x3f, 0x00
	1450	};
	1451
	1452	static const unsigned char p521_r[] = {
	1453	0x01, 0xff,
	1454	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
	1455	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
	1456	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
	1457	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xfa,
	1458	0x51, 0x86, 0x87, 0x83, 0xbf, 0x2f, 0x96, 0x6b,
	1459	0x7f, 0xcc, 0x01, 0x48, 0xf7, 0x09, 0xa5, 0xd0,
	1460	0x3b, 0xb5, 0xc9, 0xb8, 0x89, 0x9c, 0x47, 0xae,
	1461	0xbb, 0x6f, 0xb7, 0x1e, 0x91, 0x38, 0x64, 0x09
	1462	};
	1463
	1464	static const unsigned char p521_p[] = {
	1465	ECF_COMPR,
	1466	0x00, 0xc6,
	1467	0x85, 0x8e, 0x06, 0xb7, 0x04, 0x04, 0xe9, 0xcd,
	1468	0x9e, 0x3e, 0xcb, 0x66, 0x23, 0x95, 0xb4, 0x42,
	1469	0x9c, 0x64, 0x81, 0x39, 0x05, 0x3f, 0xb5, 0x21,
	1470	0xf8, 0x28, 0xaf, 0x60, 0x6b, 0x4d, 0x3d, 0xba,
	1471	0xa1, 0x4b, 0x5e, 0x77, 0xef, 0xe7, 0x59, 0x28,
	1472	0xfe, 0x1d, 0xc1, 0x27, 0xa2, 0xff, 0xa8, 0xde,
	1473	0x33, 0x48, 0xb3, 0xc1, 0x85, 0x6a, 0x42, 0x9b,
	1474	0xf9, 0x7e, 0x7e, 0x31, 0xc2, 0xe5, 0xbd, 0x66
	1475	};
	1476
	1477	DEFINE_PWJAC_CURVE(p521, p521, PWJAC_MINUS3)
	1478
	1479	/* Scalar multiplication: store in d the product n*p. This is the payoff for
	1480	* all of the above work.
	1481	*/
	1482	void ec_scalarmul(struct ec ec, union iecpt d, union iecpt *p, Bignum n)
	1483	{
	1484	union iecpt t;
	1485	long i;
	1486	BignumInt v;
	1487
	1488	#define BIT(i) ((n[(i)/BIGNUM_INT_BITS + 1] >> ((i)%BIGNUM_INT_BITS)) & 1)
	1489
	1490	/* Setting up. */
	1491	ec->ops->init(ec, &t); ec->ops->setinf(ec, &t);
	1492	if (!n[0]) goto done;
	1493	i = n[0]*BIGNUM_INT_BITS - 1;
	1494	while (i > 0 && !BIT(i)) i--;
	1495
	1496	/* The main algorithm. This is a straightforward right-to-left double-
	1497	* and-add scalar multiplication. It works like this. We implement a
	1498	* function M defined inductively as follows.
	1499	*
	1500	* M(P, 0, 0, Q) = Q
	1501	* M(P, i + 1, b 2^i + n', Q) = M(P, i, n', 2 Q + b P)
	1502	* (if n' < 2^i)
	1503	*
	1504	* It's easy to show (by induction on i) the important property of this
	1505	* function
	1506	*
	1507	* M(P, i, n, Q) = 2^i Q + n P (if n < 2^i)
	1508	*
	1509	* and, in particular, then
	1510	*
	1511	* M(P, i, n, 0) = n P
	1512	*
	1513	* Many fancier algorithms exist (e.g., window NAF, comb, ...), with
	1514	* various tradeoffs between precomputation and calculation performance.
	1515	* But this will do for now.
	1516	*/
	1517	while (i > 0) {
	1518	ec->ops->dbl(ec, &t, &t);
	1519	if (BIT(i)) ec->ops->add(ec, &t, p);
	1520	i--;
	1521	}
	1522	}