[u/mdw/catacomb] / README.random

Catacomb's random number generators


	Catacomb contains a large number of pseudorandom number
	generators.  Some are dedicated to generating pseudorandom
	numbers; others are ciphers in output-feedback mode.  Some are
	cryptographically strong, others are fast or simple.  All have
	been statistically tested and come out OK.


The `rand' generator

	Cryptographic applications need a source of true random data.
	The `rand' generator is Catacomb's true-random-data source.

	The design is my own.  I wasn't happy with any existing designs,
	and I'm still not -- even Counterpane's Yarrow gives me an
	uneasy feeling[1].  I'm not aware of any cryptanalytic attack
	against my generator, and I'd very much like to hear if anyone
	else knows differently.

	The generator's state is held in an object of type `rand_pool'.
	Data can be added into the pool using the `rand_add' function.
	Random numbers can be extracted by calling `rand_get'.

	It's possible to attach a `noise acquisition' function to a
	rand_pool.  The function `rand_getgood' extracts a number of
	bytes from the generator, calling the noise acquisition function
	if it thinks it's low on randomness.  This shouldn't actually be
	necessary once the generator has been topped up once.  It makes
	the generator very slow.

	It's also possible to key the generator.  The output
	transformation then depends on the key, making it even harder
	for an adversary to work out anything useful about the
	generator.

	The underlying transformation used by the generator is

		x' = E_{H(x)}(x)

	i.e., hash, use the hash as a key, and encrypt.


	[1] The Yarrow paper states that it offers 160 bits-worth of
	    security, and that an adversary can't distinguish its output
	    from real random bits.  I have an attack which takes about
	    2^64 64-bit outputs and tells the difference between
	    Yarrow-160 and real random data, by analysing the frequency
	    of adjacent 64-bit blocks being equal.


Noncryptographic generators

	Two pseudorandom-number generators are supplied which have no
	cryptographic strength whatever.  They are, respectively, a
	traditional linear-congruential generator (lcrand) and a
	Fibonacci generator (fibrand).

	The linear-congruential generator has three (fixed) parameters,
	a, c and p, and calculates

		x[i] = (a x[i - 1] + c) mod p

	The prime p is fairly large (2^32 - 5, in fact), and a and c are
	carefully chosen.  The generator has a period of p - 1.  There
	is one fixed point (i.e., it transforms to itself rather than
	being in the cycle) which must be avoided.

	The Fibonacci generator works by adding together old results:

		x[i] = (x[i - 24] + x[i - 55]) mod 2^32

	It has a very long period, and is the fastest pseudorandom
	number generator in Catacomb.

	These two generators are very useful when the cryptographic
	strength of the output isn't important, for example when
	choosing random numbers for primality testing.


The Blum-Blum-Shub generator

	This is by far the strongest and by even further the slowest
	pseudorandom-number generator in Catacomb.  Its output has been
	proven to be indistinguishable from random data by *any*
	polynomial-time test, assuming that factoring large numbers is
	difficult.

	Use the BBS generator when you're feeling really paranoid, and
	you've got a few hours with nothing to do.


Output-feedback ciphers

	Ciphers in output-feedback mode simply emit a pseudorandom
	stream.  To construct a cipher, you XOR the stream with the
	plaintext.  It's even easier just to use the output as random
	numbers.


Generic interface

	There's a fairly comprehensive generic interface for Catacomb's
	various generators.

	Each generator provides a function which returns a pointer to a
	`generic pseudorandom-number generator', of type `grand'.

	A `grand' is a structured type containing one member, `ops',
	which is a pointer to various useful things.  Let `r' be a
	pointer to a generic pseudorandom-number generator; then,

	r->ops->name		The name of the generator.

	r->ops->max		One greater than the maximum output of
				the `raw' function, or zero if it just
				emits words.

	r->ops->misc(r, op, ...)
				Performs a miscellaneous operation.  The
				various standard `op' values are
				described below.

	r->ops->destroy(r)	Destroy the generic generator.

	r->ops->raw(r)		Return a raw output from the generator.
				For `lcrand', this is an integer in the
				range [0, p); for other generators, it's
				the same as `word'.

	r->ops->byte(r)		Return an octet.  The result is
				uniformly distributed over the integers
				in the interval [0, 256).

	r->ops->word(r)		Returns a word.  The result is uniformly
				distributed over the integers in the
				interval [0, 2^32).

	r->ops->range(r, l)	Returns an integer unformly distributed
				in the interval [0, l), l < 2^32.

	r->ops->fill(r, p, sz)	Fill the sz bytes at address p with
				uniformly distributed bytes (i.e.,
				integers in the interval [0, 256).

	All of these operations are supported by all generators.  Some
	may be inefficient, however, since they're synthesized using
	operations more suited to the particular generator.  For
	example, `word' is not efficient on the linear-congruential
	generator, but is for the Fibonacci generator.

	The misc-ops are as follows:

	misc(r, GRAND_CHECK, op)	Return nonzero if `op' is a
					recognized misc-op for this
					generator.

	misc(r, GRAND_SEEDINT, s)	Seed the generator using the
					integer s.

	misc(r, GRAND_SEEDUINT32, s)	Seed the generator using the
					32-bit word s.

	misc(r, GRAND_SEEDBLOCK, p, sz)	Seed the generator using the
					block of sz bytes starting at
					address p.

	misc(r, GRAND_SEEDMP, m)	Seed the generator using the
					multiprecision integer m.

	misc(r, GRAND_SEEDRAND, r')	Seed the generator using some
					output from the generic pseudo-
					random number generator r'.


Statistical testing

	I generated 10M of data with each of Catacomb's random number
	generators, and tested them with George Marsaglia's DIEHARD
	suite.  While the random data is not included here, the
	following trivial Bourne shell script will generate the exact
	output streams I used, assuming that the supplied `rspit'
	program is in the current PATH:

	for i in des 3des idea blowfish rc5 rc4; do
	  rspit $i -kseed -z10m -o$i.rand
	done
	for i in lcrand fibrand; do
	  rspit $i -p -s0 -z10m -o$i.rand
	done
	rspit rand -p -z10m -orand.rand
	nice rspit bbs -p -s2 -z10m -obbs.rand

	(Note that generating the BBS output data can take over an
	hour.  Make sure you have a good book to read.)

	Looking at the results, I think that they all passed the tests
	relatively convincingly.

	I've not actually included the DIEHARD output files in the
	distribution, because they're quite large, and anyone can
	reproduce my results exactly using the publically available
	DIEHARD distribution and the code supplied.  If you do actually
	want them, send me some mail and I'll send you a 60K tar.gz
	file by (approximate) return.	
--
[mdw]

\f
Local variables:
mode: text
End:
Commit	Line	Data
759513d1	1	Catacomb's random number generators
	2
	3
	4	Catacomb contains a large number of pseudorandom number
	5	generators. Some are dedicated to generating pseudorandom
	6	numbers; others are ciphers in output-feedback mode. Some are
	7	cryptographically strong, others are fast or simple. All have
	8	been statistically tested and come out OK.
	9
	10
	11	The `rand' generator
	12
	13	Cryptographic applications need a source of true random data.
	14	The `rand' generator is Catacomb's true-random-data source.
	15
	16	The design is my own. I wasn't happy with any existing designs,
	17	and I'm still not -- even Counterpane's Yarrow gives me an
	18	uneasy feeling[1]. I'm not aware of any cryptanalytic attack
	19	against my generator, and I'd very much like to hear if anyone
	20	else knows differently.
	21
	22	The generator's state is held in an object of type `rand_pool'.
	23	Data can be added into the pool using the `rand_add' function.
	24	Random numbers can be extracted by calling `rand_get'.
	25
	26	It's possible to attach a `noise acquisition' function to a
	27	rand_pool. The function `rand_getgood' extracts a number of
	28	bytes from the generator, calling the noise acquisition function
	29	if it thinks it's low on randomness. This shouldn't actually be
	30	necessary once the generator has been topped up once. It makes
	31	the generator very slow.
	32
	33	It's also possible to key the generator. The output
	34	transformation then depends on the key, making it even harder
	35	for an adversary to work out anything useful about the
	36	generator.
	37
	38	The underlying transformation used by the generator is
	39
	40	x' = E_{H(x)}(x)
	41
	42	i.e., hash, use the hash as a key, and encrypt.
	43
	44
	45	[1] The Yarrow paper states that it offers 160 bits-worth of
	46	security, and that an adversary can't distinguish its output
	47	from real random bits. I have an attack which takes about
	48	2^64 64-bit outputs and tells the difference between
	49	Yarrow-160 and real random data, by analysing the frequency
	50	of adjacent 64-bit blocks being equal.
	51
	52
	53	Noncryptographic generators
	54
	55	Two pseudorandom-number generators are supplied which have no
	56	cryptographic strength whatever. They are, respectively, a
	57	traditional linear-congruential generator (lcrand) and a
	58	Fibonacci generator (fibrand).
	59
	60	The linear-congruential generator has three (fixed) parameters,
	61	a, c and p, and calculates
	62
	63	x[i] = (a x[i - 1] + c) mod p
	64
65	The prime p is fairly large (2^32 - 5, in fact), and a and c are
66	carefully chosen. The generator has a period of p - 1. There
67	is one fixed point (i.e., it transforms to itself rather than
68	being in the cycle) which must be avoided.
69
70	The Fibonacci generator works by adding together old results:
71
72	x[i] = (x[i - 24] + x[i - 55]) mod 2^32
73
74	It has a very long period, and is the fastest pseudorandom
75	number generator in Catacomb.
76
77	These two generators are very useful when the cryptographic
78	strength of the output isn't important, for example when
79	choosing random numbers for primality testing.
80
81
82	The Blum-Blum-Shub generator
83
84	This is by far the strongest and by even further the slowest
85	pseudorandom-number generator in Catacomb. Its output has been
86	proven to be indistinguishable from random data by any
87	polynomial-time test, assuming that factoring large numbers is
88	difficult.
89
90	Use the BBS generator when you're feeling really paranoid, and
91	you've got a few hours with nothing to do.
92
93
94	Output-feedback ciphers
95
96	Ciphers in output-feedback mode simply emit a pseudorandom
97	stream. To construct a cipher, you XOR the stream with the
98	plaintext. It's even easier just to use the output as random
99	numbers.
100
101
102	Generic interface
103
104	There's a fairly comprehensive generic interface for Catacomb's
105	various generators.
106
107	Each generator provides a function which returns a pointer to a
108	`generic pseudorandom-number generator', of type `grand'.
109
110	A `grand' is a structured type containing one member, `ops',
111	which is a pointer to various useful things. Let `r' be a
112	pointer to a generic pseudorandom-number generator; then,
113
114	r->ops->name The name of the generator.
115
116	r->ops->max One greater than the maximum output of
117	the `raw' function, or zero if it just
118	emits words.
119
120	r->ops->misc(r, op, ...)
121	Performs a miscellaneous operation. The
122	various standard `op' values are
123	described below.
124
125	r->ops->destroy(r) Destroy the generic generator.
126
127	r->ops->raw(r) Return a raw output from the generator.
128	For `lcrand', this is an integer in the
129	range [0, p); for other generators, it's
130	the same as `word'.
131
132	r->ops->byte(r) Return an octet. The result is
133	uniformly distributed over the integers
134	in the interval [0, 256).
135
136	r->ops->word(r) Returns a word. The result is uniformly
137	distributed over the integers in the
138	interval [0, 2^32).
139
140	r->ops->range(r, l) Returns an integer unformly distributed
141	in the interval [0, l), l < 2^32.
142
143	r->ops->fill(r, p, sz) Fill the sz bytes at address p with
144	uniformly distributed bytes (i.e.,
145	integers in the interval [0, 256).
146
147	All of these operations are supported by all generators. Some
148	may be inefficient, however, since they're synthesized using
149	operations more suited to the particular generator. For
150	example, `word' is not efficient on the linear-congruential
151	generator, but is for the Fibonacci generator.
152
153	The misc-ops are as follows:
154
155	misc(r, GRAND_CHECK, op) Return nonzero if `op' is a
156	recognized misc-op for this
157	generator.
158
159	misc(r, GRAND_SEEDINT, s) Seed the generator using the
160	integer s.
161
162	misc(r, GRAND_SEEDUINT32, s) Seed the generator using the
163	32-bit word s.
164
165	misc(r, GRAND_SEEDBLOCK, p, sz) Seed the generator using the
166	block of sz bytes starting at
167	address p.
168
169	misc(r, GRAND_SEEDMP, m) Seed the generator using the
170	multiprecision integer m.
171
172	misc(r, GRAND_SEEDRAND, r') Seed the generator using some
173	output from the generic pseudo-
174	random number generator r'.
175
176
177	Statistical testing
178
179	I generated 10M of data with each of Catacomb's random number
180	generators, and tested them with George Marsaglia's DIEHARD
181	suite. While the random data is not included here, the
182	following trivial Bourne shell script will generate the exact
183	output streams I used, assuming that the supplied `rspit'
184	program is in the current PATH:
185
186	for i in des 3des idea blowfish rc5 rc4; do
187	rspit $i -kseed -z10m -o$i.rand
188	done
189	for i in lcrand fibrand; do
190	rspit $i -p -s0 -z10m -o$i.rand
191	done
192	rspit rand -p -z10m -orand.rand
193	nice rspit bbs -p -s2 -z10m -obbs.rand
194
195	(Note that generating the BBS output data can take over an
196	hour. Make sure you have a good book to read.)
197
198	Looking at the results, I think that they all passed the tests
199	relatively convincingly.
200
201	I've not actually included the DIEHARD output files in the
202	distribution, because they're quite large, and anyone can
203	reproduce my results exactly using the publically available
204	DIEHARD distribution and the code supplied. If you do actually
205	want them, send me some mail and I'll send you a 60K tar.gz
206	file by (approximate) return.
207	--
208	[mdw]
209
210	\f
211	Local variables:
212	mode: text
213	End: