9 An elderly Frenchman rises every morning at 5, goes out to the street in

10 front of his house, and sprinkles a white powder up and down the street.

12 One day, a neighbour, who has watched his routine for many years, confronts

13 him. `What is this powder you sprinkle on the street every morning,

14 Pierre?'

17 elephants away.'

19 `But Pierre,' says the neighbour. `Everybody knows that there are no

20 elephants in France.'

22 Says the older man matter-of-factly, `I guess it must be working, then.'

25 The joke has its mandatory corresponding serious message. Cryptography is

26 like elephant powder. If it seems to work, and keeps the attackers -- our

27 elephants -- away, then we trust it. This isn't really very satisfactory.

28 When the elephant powder fails, a 2-ton elephant turns up, with three of its

29 chums in a Mini, and you notice it hiding upside-down in your bowl of

30 custard. Let's face it: elephants aren't good at being surreptitious.

32 But if your cryptography is no good, you may never know.

38 So, what can we do about the situation?

40 \item Design simple cryptographic primitives, e.g., block ciphers, hash

41 functions. Develop techniques for analysing them, and attempt to stay

42 ahead of the bad guy.

43 \item Build useful constructions from trusted primitives in a modular way.

46 Here we look at this second part of the approach.

55 \item Define notions of security. Now we know what to aim for, and what to

56 expect of our components.

57 \item Prove how the security of a construction relates to the security of

58 its primitives. If it breaks, we know who to blame.

68 adversary's output (for some input) follows a probability distribution. We

70 examine the probability with which this happens.

74 \item Oracles compute using secrets hidden from the adversary.

76 \item We can restrict the types of queries the adversary makes.

79 Oracles are written as superscripts. For example, an adversary given a

85 \resetseq

91 function of $k$.

94 We say that a function is (asymptotically) secure in some sense if, for any

95 polynomial $p(k)$ there is a negligible function $\nu(k)$ such that, for

96 any construction in the family, parameterized by $k$, no adversary which

97 runs for time $p(k)$ has success probability better than $\nu(k)$.

103 Suppose we build an encryption scheme from a one-way function. We'd like

104 to prove that the encryption is good if the one-way function is secure. We

105 do this by contradiction:

107 \item Suppose an adversary $A$ breaks the encryption scheme with

108 better-than-negligible probability.

110 to break the one-way function, in polynomial time, and with

111 better-than-negligible probability,

112 \item Claim that this violates the assumption of a secure one-way function.

115 This doesn't work with real constructions. We don't know where the

116 asymptotics set in, and they can conceal large constants. It's still

117 better than nothing.

124 We constrain the resources we allow the adversary:

126 \item Running time (including program size).

127 \item Number of oracle queries.

128 \item Maximum size of oracle queries.

131 which runs in time $t$ and makes $q$ oracle queries can break it with

132 probability better than $\epsilon$.

137 This is a much more satisfactory approach. However, we still have to

145 \resetseq

161 For our purposes, we can think of sets as being collections of objects.

163 We use the usual notations for set membership ($x \in X$), intersection ($X

168 items $f(x)$ for those $x$ for which the predicate $P(x)$ is true.

171 elements in the set.

173 The power set $\powerset(X)$ of a set $X$ is the set of all subsets of $X$.

177 exponents to indicate the product of a set with itself: hence, $X^2 = X

178 \times X$.

181 y)$ if $(x, y) \in R$. Relations between two sets are often written as

182 infix symbols: e.g., $x \sim y$.

188 In addition to strings, defined later, we use the following standard sets:

210 describes the domain of an arbitrary function; $\ran f$ describes its

211 range.

213 We sometimes apply the function notation to sets, indicating that the

235 We shall use this view in some of the algorithms we present. In addition,

237 ordered pair notation $(x, y)$ to reinforce the notion of a mapping.

239 We might write, for example,

243 to augment a function by the addition of a new mapping. This is clearly

244 only valid if $x \notin \dom f$ (or $f(x) = y$) initially.

254 Suppose $A$ is an alphabet. The set of sequences of exactly $n$ symbols

260 number $n$ where $a \in A^n$.

263 \cat y$, or sometimes just $x y$, is the result of writing the symbols in

264 $x$ followed by the symbols in $y$, in order. We have $|x y| = |x| + |y|$.

270 There are natural (injective) mappings between bit strings and natural

271 numbers.

274 it the natural number

277 The other natural mapping is

279 It doesn't matter which you choose, as long as you're consistent.

281 For simplicity's sake, we shall tend to switch between strings and the

282 numbers (and occasionally more exotic objects) they represent implicitly.

289 We'll find it useful to be able to break up strings in the algorithms we

290 present. We use the statement

294 to mean that the string $x$ is broken up into individual strings $x_0$,

295 $x_1$, \ldots, $x_k$, such that

300 We may omit one of the $n_i$ length indicators, since it can be deduced

301 from the length of the string $x$ and the other $n_j$.

313 We shall abuse set membership notation for vectors; i.e., we write $x \in

317 When we apply functions to vectors, we mean that they are applied

333 which occur with nonzero probability.

336 chosen independently at random, according to the distribution

351 \resetseq

355 distributions.

361 all possible values $x$ of $X$, we have

378 Fix a value of $k$. Again, let $X$ be distributed according to

394 We say that two families of distributions are computationally

395 indistinguishable if no `efficient' algorithm can tell them apart with

396 better than `negligible' probability.

403 any $k$:

406 Statistical closeness implies computational indistinguishability.

416 computationally indistinguishable.

423 Suppose we throw $q$ balls into $n$ bins at random (with $q \le n$). Let

428 previous one. Obviously, the worst case for this is when none of the other

429 balls have collided, so

431 Then

438 This is an extremely useful result, and we shall need it often.

445 \resetseq

448 Intuition: a one-way function is easy to compute but hard to invert.

451 Now we play a game:

454 \item Let $x'$ be the output of $I$ when run on input $y$.

455 \item We say that $I$ `wins' if $f(x') = y$; otherwise it loses.

457 Note that we don't care whether $x = x'$.

466 is

469 y \gets f(x);

470 x' \gets I(y) :

472 where the probability is taken over all choices of $x$ and all coin flips

473 made by $I$.

476 over possible inverters:

478 where the maximum is taken over all $I$ running in time $t$.

488 one-way function easy. This is most useful when the one-way function is a

490 T)$ is a triple of algorithms:

493 \item The probabilistic algorithm $G$ is given some parameter $k$ and

494 returns a pair $(P, K)$, containing the public parameters $P$ for

495 computing an instance of the one-way function, and the secret trapdoor

496 information $K$ for inverting it. We write $(P, K) \in G(k)$.

498 \item The algorithm $f$ implements the one-way function. That is, if $(P,

500 one-way function.

502 \item The algorithm $T$ inverts $f$ using the trapdoor information. That

503 is, if $(P, K) \in G(k)$ and $y = f_P(x)$ for some $x$, then $y =

512 A pseudorandom generator (PRG) `stretches' an input seed into a longer

513 string which `looks' random.

517 $G$ is:

525 where the maximum is taken over all distinguishers $D$ running in time

527 $(t, \epsilon)$-secure PRG\@.

532 $k \ge L$.

534 \item Show that trivial PRGs exist.

535 \item Show that if $g$ is nontrivial, then $g$ is also a one-way function,

536 with

541 \item The identity function $I(x) = x$ is a trivial PRG, with

545 is the output of $g$, then $A$ inverts $y$ with probability

549 security bound than this simplistic analysis.

557 $g_1(\cdot)$ be the second $k$ bits. Define the sequence of generators

583 Advantage is a concept used in many definitions of security notions:

591 \item Zero means that the adversary couldn't distinguish.

592 \item Negative advantage means the adversary got them the wrong way

593 around. There is another adversary which uses the same resources but has

594 positive advantage.

602 Let $b$ be the bit that $A$ returns, and let $b^*$ be the experiment's

603 hidden bit. Then

605 Addressing the above claims in order:

609 at most 1.

633 All present and correct.

646 We want to say that $F$ is a strong PRF if adversaries find it hard to

647 distinguish an instance $F_K$ from a function chosen completely at random

648 with the same `shape'.

651 selected $F_K$, or for completely random function, and ask it to try and

652 say which it is given.

661 We define the advantage of a distinguisher $D$ against the PRF $F$ as

662 follows:

668 The insecurity of the PRF is then measured as

670 where the maximum is taken over all distinguishers $D$ which run for time

688 The advantage of a distinguisher $D$ against the PRP $F$ is

696 PRFs, and the notion of $(t, q, \epsilon)$-security is the same.

703 the distinguisher is allowed to make inverse queries:

709 Since there are two oracles, we count queries to both when evaluating the

710 insecurity:

712 where the maximum is taken over all distinguishers $D$ which run for time

713 $t$, make $q$ queries to the standard oracle, and $q'$ queries to the

715 $F$ is a $(t, q, q', \epsilon)$-secure super PRP\@.

719 Note that the key length hasn't appeared anywhere in the definition of

720 insecurity for a PRP. Derive lower bounds for the insecurity of a PRP with

721 a $k$-bit key.

732 \resetseq

735 We model block ciphers as families of PRPs (not super PRPs). Most of the

736 analysis works best on PRFs, though. We show that a PRP makes a `pretty

737 good' PRF, as long as it's not over-used.

739 Let $F$ be any PRP family. Then

743 the block size -- then a PRP makes a good PRF.

746 meet it often when we examine modes of operation. We shall examine the

747 proof, because it illustrates some useful techniques.

753 This handy lemma states that the difference in the probability of some

754 outcome between the two games is bounded above by the probability that the

755 games differ.

763 We have:

768 Subtracting gives

771 as required.

779 permutation. We aim to show that $F$ is also a pseudorandom function.

781 Let $A$ be an adversary which distinguishes~$F$ from a pseudorandom

782 function in time~$t$, after making $q$ oracle queries. We consider a

784 let $S_i$ be the event that the adversary returns~$1$ at the end of the

785 game.

787 Game~$\G0$ is the `random function' game. We given $A$ an oracle

790 Game~$\G1$ is the `PRF' game. We give $A$ an oracle which computes

793 By definition, then,

806 the event that this occurs.

809 must have

813 Using the earlier result on collisions, it's easy to see that

823 $F_2$ doesn't occur in $\G2$ (i.e., there is no collision), then the random

824 function is indistinguishable from a random permutation. So

827 By definition, we have

829 We can now tie all of this together.

835 A simple calculation shows that

843 In the second line, we used the bound we computed on the absolute

846 against a PRP; and in the fifth we used the definition of insecurity for a

847 PRP.

853 Finally, we imposed no restrictions on $A$, except that it run in time $t$

854 and make $q$ oracle queries. So our bound

857 the most successful adversary running with those resource bounds.

859 Hence, we can maximize, showing that

862 as required.

867 \resetseq

870 Hash functions like MD5 and SHA-1 are extremely useful primitives. What

871 properties do we expect of them? This out to be an extremely difficult

872 question to answer.

874 \item One-wayness. We've seen a definition for this already. But it's not

875 enough.

876 \item Collision-resistance. This is the property usually claimed as the

877 requirement for use in digital signature systems. We'll look at this

878 later.

879 \item Randomness. What does this mean, when the function is completely

880 public? A distinguishability criterion is clearly hopeless.

890 which transforms an input string $x$ as follows:

892 \item Pad $x$ to a multiple of $L$ bits in some injective way. Divide the

896 \item The result $H(x) = I_n$.

899 Suppose we have two strings $x \ne y$, such that $H(x) = H(y)$; i.e., a

902 vectors for hash functions have such obviously regular forms.)

907 the $l$-bit blocks of two distinct (padded) messages, and without loss

908 of generality suppose that $n \ge n'$. Let $I_0 = I'_0 = I$, let

912 We prove the result by induction on $n$. The case $n = 0$ is trivially

913 true, since there is only one zero-block message. Suppose, then, that the

914 result is true for $n$-block messages. There are three cases to consider.

919 messages, and because the remaining messages must differ in at least one

920 block, we can apply the inductive hypothesis on these shorter messages to

921 complete the proof.

927 The statement usually made about a `good' hash function $h$ is that it

928 should be `difficult' to find a collision: i.e., two preimages $x \ne y$

929 where $H(x) = H(y)$. How do we formalize this? Here's one attempt:

935 already `knows' the a collision for $H$ and just outputs the right answer.

936 It succeeds very quickly, and with probability 1. So this definition is

937 impossible to satisfy.

943 The concept of targetted collision resistance is relatively new, but quite

945 functions. They're not really keyed, because the indices aren't kept

946 secret.

948 When making a signature, an index $i$ is chosen at random, and the

949 signature for message $m$ is formed over the pair $(i, H_i(M))$.

951 TCR-hash functions are the subject of ongoing research. No practical

952 designs exist at the moment.

958 Consider the following experiment:

962 $i \getsr \keys H$; \\

968 The security of a TCR-hash function is measured as:

970 where the maximum is taken over all adversaries running in time $t$. We

971 define $(t, \epsilon)$-security as usual.

977 In practice, we expect much more than just collision resistance from hash

978 functions: we expect a certain amount of `random' behaviour. But this is

979 hard to quantify.

981 One approach is to model a hash function as a `random oracle', i.e., an

982 oracle containing a function chosen at random, used by the construction

983 under consideration, and made available to the adversary. The idea is that

984 reductions can `capture' the knowledge of an adversary by examining the

985 queries it makes to its random oracle.

988 better than nothing.

996 There are a number of `standard' assumptions that are made about the

997 difficulty of various problems:

999 \item IFP, the Integer Factorization Problem;

1000 \item QRP, the Quadratic Residuosity Problem;

1001 \item DLP, the Discrete Logarithm Problem;

1002 \item RSAP, the RSA Problem; and

1003 \item CDH, the Computational Diffie-Hellman problem and its variants

1010 \resetseq

1013 We often assume that large integers of the form $n = p q$, where $p$ and

1014 $q$ are primes of roughly the same size, are `hard' to factor.

1015 Specifically, there is no algorithm which will factor such an $n$ in time

1016 bounded by a polynomial function of $\log n$.

1018 The difficulty of various other problems, e.g., Quadratic Residuosity, or

1019 RSA, depend on the difficulty of factoring; however, it is not yet known

1020 whether the ability to solve QRP or RSAP can be used to factor.

1026 The problem of extracting square roots modulo $n = p q$ is provably as hard

1027 as factoring. This is the basis of Rabin's public key encryption and

1028 digital signature schemes. We shall analyse these later.

1030 Suppose $Q(n, y)$ is an algorithm which returns an $x$ such that $x^2

1032 nontrivial factor of $n$ as follows:

1038 $x' \gets Q(n, y)$; \\

1039 $p \gets \gcd(x + x', n)$; \\

1041 \FOREVER;

1046 The program attempts to find two square roots of $y$ mod $n$. It's easy to

1049 a factorization of $k n$. It remains to prove the probability bound on $x

1050 + x'$ being a nontrivial factor of $n$.

1054 $n$.

1056 Firstly, we claim that, if $p$ is an odd prime then the congruence $x^2

1064 There must exist odd primes $p$, $q$, such that $p|n$ and $q|n$, and $x

1068 since $q$ is odd. Hence, $p$ divides $x + y$, but $q$ does not, so $\gcd(x

1069 + y, n)$ is a nontrivial factor of $n$, completing the proof.

1081 whether $x$ is a quadratic residue mod $p$:

1084 0 & if $p$ divides $x$ \\

1085 -1 & if $x$ is a quadratic nonresidue mod $p$ \\

1086 +1 & if $x$ is a quadratic residue mod $p$

1093 This can be efficiently computed without knowledge of the factors of $n$

1102 quadratic residue or it might not; if not, then we say that $x$ is a

1106 at random, then

1108 since we have

1110 with each occurring with equal probability.

1112 The problem of distinguishing pseudosquares from quadratic residues is

1113 called the Quadratic Residuosity Problem (QRP). It is not known how to

1114 solve this problem without factoring $n$.

1121 The (Integer) Discrete Logarithm Problem asks for the solution $x$ given a

1123 difficult as factoring. (The ability to solve discrete logs modulo $n$ is

1124 sufficient to factor $n$. The best known algorithms for IFP and DLP have

1125 the same running time.)

1127 The problem generalizes to other cyclic groups, e.g., elliptic curves over

1128 finite fields.

1133 \resetseq

1136 The problems of square-root extraction, deciding quadratic residuosity, the

1137 RSA problem, and finding discrete logarithms share the property of being

1140 probability distribution of problem instances}, such that a solution to

1141 one of the derived instances yields a solution to the original one.

1143 This is a good property to have. It implies that `most' problem instances

1144 are as hard as the hardest instances.

1150 The RSA problem is to compute $e$-th roots modulo $n = p q$, where $e$ is

1151 relatively prime to $n$. Suppose that the algorithm $S(n, e, y)$ returns a

1153 the special symbol $\bot$ otherwise. The following probabilistic algorithm

1154 then solves the RSA problem for arbitrary $y$:

1161 $x \gets S(n, e, y')$; \\

1163 \FOREVER;

1178 probabilistic algorithm which solves the computational Diffie-Hellman

1179 problem in time polynomial in $\log q$.

1181 Obviously, being able to compute discrete logs in $G$ efficiently would

1182 yield a solution to the Diffie-Hellman problem. But it may be the case

1183 that the Diffie-Hellman problem is easier than the discrete log problem.

1185 The Diffie-Hellman problem is self-reducible.

1191 The computational Diffie-Hellman assumption makes a statement only about

1193 Diffie-Hellman is frequently used for key-exchange, what can we say about

1194 the ability of an adversary to guess any of the bits?

1196 The Decisional Diffie-Hellman (DDH) assumption asserts that, if you don't

1198 random element of $G$; that is, that the distributions of the following

1199 experiments are computationally indistinguishable:

1204 \next

1215 If $A$ is an algorithm attempting to solve DDH in the group $G$, then we

1216 write its advantage as

1225 and the insecurity of DDH in $G$ as

1227 with the maximum over all algorithms $A$ running in time $t$.

1233 If you can solve the computational Diffie-Hellman problem, you can solve

1234 the decisional version. If you can compute discrete logs, you can solve

1235 both.

1238 be) hard, but the decisional problem is easy. In particular, if the group

1239 order $q$ has small factors then the decisional problem isn't hard.

1242 \endinput

1244 %%% Local Variables:

1245 %%% mode: latex

1246 %%% TeX-master: "ips"

1247 %%% End: