9 An elderly Frenchman rises every morning at 5, goes out to the street in

10 front of his house, and sprinkles a white powder up and down the street.

12 One day, a neighbour, who has watched his routine for many years, confronts

13 him. `What is this powder you sprinkle on the street every morning,

14 Pierre?'

17 elephants away.'

19 `But Pierre,' says the neighbour. `Everybody knows that there are no

20 elephants in France.'

22 Says the older man matter-of-factly, `I guess it must be working, then.'

25 The joke has its mandatory corresponding serious message. Cryptography is

26 like elephant powder. If it seems to work, and keeps the attackers -- our

27 elephants -- away, then we trust it. This isn't really very satisfactory.

28 When the elephant powder fails, a 2-ton elephant turns up, with three of its

29 chums in a Mini, and you notice it hiding upside-down in your bowl of

30 custard. Let's face it: elephants aren't good at being surreptitious.

32 But if your cryptography is no good, you may never know.

38 So, what can we do about the situation?

40 \item Design simple cryptographic primitives, e.g., block ciphers, hash

41 functions. Develop techniques for analysing them, and attempt to stay

42 ahead of the bad guy.

43 \item Build useful constructions from trusted primitives in a modular way.

46 Here we look at this second part of the approach.

55 \item Define notions of security. Now we know what to aim for, and what to

56 expect of our components.

57 \item Prove how the security of a construction relates to the security of

58 its primitives. If it breaks, we know who to blame.

68 adversary's output (for some input) follows a probability distribution. We

70 examine the probability with which this happens.

74 \item Oracles compute using secrets hidden from the adversary.

76 \item We can restrict the types of queries the adversary makes.

79 Oracles are written as superscripts. For example, an adversary given a

85 \resetseq

91 function of $k$.

94 We say that a function is (asymptotically) secure in some sense if, for any

95 polynomial $p(k)$ there is a negligible function $\nu(k)$ such that, for

96 any construction in the family, parameterized by $k$, no adversary which

97 runs for time $p(k)$ has success probability better than $\nu(k)$.

103 Suppose we build an encryption scheme from a one-way function. We'd like

104 to prove that the encryption is good if the one-way function is secure. We

105 do this by contradiction:

107 \item Suppose an adversary $A$ breaks the encryption scheme with

108 better-than-negligible probability.

110 to break the one-way function, in polynomial time, and with

111 better-than-negligible probability,

112 \item Claim that this violates the assumption of a secure one-way function.

115 This doesn't work with real constructions. We don't know where the

116 asymptotics set in, and they can conceal large constants. It's still

117 better than nothing.

124 We constrain the resources we allow the adversary:

126 \item Running time (including program size).

127 \item Number of oracle queries.

128 \item Maximum size of oracle queries.

131 which runs in time $t$ and makes $q$ oracle queries can break it with

132 probability better than $\epsilon$.

137 This is a much more satisfactory approach. However, we still have to

145 \resetseq

161 For our purposes, we can think of sets as being collections of objects.

163 We use the usual notations for set membership ($x \in X$), intersection ($X

168 items $f(x)$ for those $x$ for which the predicate $P(x)$ is true.

171 elements in the set.

173 The power set $\powerset(X)$ of a set $X$ is the set of all subsets of $X$.

177 exponents to indicate the product of a set with itself: hence, $X^2 = X

178 \times X$.

181 y)$ if $(x, y) \in R$. Relations between two sets are often written as

182 infix symbols: e.g., $x \sim y$.

188 In addition to strings, defined later, we use the following standard sets:

210 describes the domain of an arbitrary function; $\ran f$ describes its

211 range.

213 We sometimes apply the function notation to sets, indicating that the

235 We shall use this view in some of the algorithms we present. In addition,

237 ordered pair notation $(x, y)$ to reinforce the notion of a mapping.

239 We might write, for example,

243 to augment a function by the addition of a new mapping. This is clearly

244 only valid if $x \notin \dom f$ (or $f(x) = y$) initially.

254 Suppose $A$ is an alphabet. The set of sequences of exactly $n$ symbols

260 number $n$ where $a \in A^n$.

263 \cat y$, or sometimes just $x y$, is the result of writing the symbols in

264 $x$ followed by the symbols in $y$, in order. We have $|x y| = |x| + |y|$.

270 There are natural (injective) mappings between bit strings and natural

271 numbers.

274 it the natural number

277 The other natural mapping is

279 It doesn't matter which you choose, as long as you're consistent.

281 For simplicity's sake, we shall tend to switch between strings and the

282 numbers (and occasionally more exotic objects) they represent implicitly.

289 We'll find it useful to be able to break up strings in the algorithms we

290 present. We use the statement

294 to mean that the string $x$ is broken up into individual strings $x_0$,

295 $x_1$, \ldots, $x_k$, such that

300 We may omit one of the $n_i$ length indicators, since it can be deduced

301 from the length of the string $x$ and the other $n_j$.

313 We shall abuse set membership notation for vectors; i.e., we write $x \in

317 When we apply functions to vectors, we mean that they are applied

333 which occur with nonzero probability.

336 chosen independently at random, according to the distribution

351 \resetseq

355 distributions.

361 all possible values $x$ of $X$, we have

378 Fix a value of $k$. Again, let $X$ be distributed according to

394 We say that two families of distributions are computationally

395 indistinguishable if no `efficient' algorithm can tell them apart with

396 better than `negligible' probability.

403 any $k$:

406 Statistical closeness implies computational indistinguishability.

416 computationally indistinguishable.

423 Suppose we throw $q$ balls into $n$ bins at random (with $q \le n$). Let

428 previous one. Obviously, the worst case for this is when none of the other

429 balls have collided, so

431 Then

438 This is an extremely useful result, and we shall need it often.

445 \resetseq

448 Intuition: a one-way function is easy to compute but hard to invert.

451 Now we play a game:

454 \item Let $x'$ be the output of $I$ when run on input $y$.

455 \item We say that $I$ `wins' if $f(x') = y$; otherwise it loses.

457 Note that we don't care whether $x = x'$.

466 is

469 y \gets f(x);

470 x' \gets I(y) :

472 where the probability is taken over all choices of $x$ and all coin flips

473 made by $I$.

476 over possible inverters:

478 where the maximum is taken over all $I$ running in time $t$.

488 one-way function easy. This is most useful when the one-way function is a

490 T)$ is a triple of algorithms:

493 \item The probabilistic algorithm $G$ is given some parameter $k$ and

494 returns a pair $(P, K)$, containing the public parameters $P$ for

495 computing an instance of the one-way function, and the secret trapdoor

496 information $K$ for inverting it. We write $(P, K) \in G(k)$.

498 \item The algorithm $f$ implements the one-way function. That is, if $(P,

500 one-way function.

502 \item The algorithm $T$ inverts $f$ using the trapdoor information. That

503 is, if $(P, K) \in G(k)$ and $y = f_P(x)$ for some $x$, then $y =

512 A pseudorandom generator (PRG) `stretches' an input seed into a longer

513 string which `looks' random.

517 $G$ is:

525 where the maximum is taken over all distinguishers $D$ running in time

527 $(t, \epsilon)$-secure PRG\@.

532 $k \ge L$.

534 \item Show that trivial PRGs exist.

535 \item Show that if $g$ is nontrivial, then $g$ is also a one-way function,

536 with

541 \item The identity function $I(x) = x$ is a trivial PRG, with

545 is the output of $g$, then $A$ inverts $y$ with probability

549 security bound than this simplistic analysis.

557 $g_1(\cdot)$ be the second $k$ bits. Define the sequence of generators

564 unhelpful. It probably helps understanding if you make a diagram.

586 Advantage is a concept used in many definitions of security notions:

594 \item Zero means that the adversary couldn't distinguish.

595 \item Negative advantage means the adversary got them the wrong way

596 around. There is another adversary which uses the same resources but has

597 positive advantage.

605 Let $b$ be the bit that $A$ returns, and let $b^*$ be the experiment's

606 hidden bit. Then

608 Addressing the above claims in order:

612 at most 1.

636 All present and correct.

649 We want to say that $F$ is a strong PRF if adversaries find it hard to

650 distinguish an instance $F_K$ from a function chosen completely at random

651 with the same `shape'.

654 selected $F_K$, or for completely random function, and ask it to try and

655 say which it is given.

664 We define the advantage of a distinguisher $D$ against the PRF $F$ as

665 follows:

671 The insecurity of the PRF is then measured as

673 where the maximum is taken over all distinguishers $D$ which run for time

691 The advantage of a distinguisher $D$ against the PRP $F$ is

699 PRFs, and the notion of $(t, q, \epsilon)$-security is the same.

706 the distinguisher is allowed to make inverse queries:

712 Since there are two oracles, we count queries to both when evaluating the

713 insecurity:

715 where the maximum is taken over all distinguishers $D$ which run for time

716 $t$, make $q$ queries to the standard oracle, and $q'$ queries to the

718 $F$ is a $(t, q, q', \epsilon)$-secure super PRP\@.

722 Note that the key length hasn't appeared anywhere in the definition of

723 insecurity for a PRP. Derive lower bounds for the insecurity of a PRP with

724 a $k$-bit key.

735 \resetseq

738 We model block ciphers as families of PRPs (not super PRPs). Most of the

739 analysis works best on PRFs, though. We show that a PRP makes a `pretty

740 good' PRF, as long as it's not over-used.

742 Let $F$ be any PRP family. Then

746 the block size -- then a PRP makes a good PRF.

749 meet it often when we examine modes of operation. We shall examine the

750 proof, because it illustrates some useful techniques.

756 This handy lemma states that the difference in the probability of some

757 outcome between the two games is bounded above by the probability that the

758 games differ.

766 We have:

771 Subtracting gives

774 as required.

782 permutation. We aim to show that $F$ is also a pseudorandom function.

784 Let $A$ be an adversary which distinguishes~$F$ from a pseudorandom

785 function in time~$t$, after making $q$ oracle queries. We consider a

787 let $S_i$ be the event that the adversary returns~$1$ at the end of the

788 game.

790 Game~$\G0$ is the `random function' game. We given $A$ an oracle

793 Game~$\G1$ is the `PRF' game. We give $A$ an oracle which computes

796 By definition, then,

809 the event that this occurs.

812 must have

816 Using the earlier result on collisions, it's easy to see that

826 $F_2$ doesn't occur in $\G2$ (i.e., there is no collision), then the random

827 function is indistinguishable from a random permutation. So

830 By definition, we have

832 We can now tie all of this together.

838 A simple calculation shows that

846 In the second line, we used the bound we computed on the absolute

849 against a PRP; and in the fifth we used the definition of insecurity for a

850 PRP.

856 Finally, we imposed no restrictions on $A$, except that it run in time $t$

857 and make $q$ oracle queries. So our bound

860 the most successful adversary running with those resource bounds.

862 Hence, we can maximize, showing that

865 as required.

870 \resetseq

873 Hash functions like MD5 and SHA-1 are extremely useful primitives. What

874 properties do we expect of them? This out to be an extremely difficult

875 question to answer.

877 \item One-wayness. We've seen a definition for this already. But it's not

878 enough.

879 \item Collision-resistance. This is the property usually claimed as the

880 requirement for use in digital signature systems. We'll look at this

881 later.

882 \item Randomness. What does this mean, when the function is completely

883 public? A distinguishability criterion is clearly hopeless.

893 which transforms an input string $x$ as follows:

895 \item Pad $x$ to a multiple of $L$ bits in some injective way. Divide the

899 \item The result $H(x) = I_n$.

902 Suppose we have two strings $x \ne y$, such that $H(x) = H(y)$; i.e., a

905 vectors for hash functions have such obviously regular forms.)

910 the $l$-bit blocks of two distinct (padded) messages, and without loss

911 of generality suppose that $n \ge n'$. Let $I_0 = I'_0 = I$, let

915 We prove the result by induction on $n$. The case $n = 0$ is trivially

916 true, since there is only one zero-block message. Suppose, then, that the

917 result is true for $n$-block messages. There are three cases to consider.

922 messages, and because the remaining messages must differ in at least one

923 block, we can apply the inductive hypothesis on these shorter messages to

924 complete the proof.

930 \vfil

941 \vfil

947 The statement usually made about a `good' hash function $h$ is that it

948 should be `difficult' to find a collision: i.e., two preimages $x \ne y$

949 where $H(x) = H(y)$. How do we formalize this? Here's one attempt:

955 already `knows' the a collision for $H$ and just outputs the right answer.

956 It succeeds very quickly, and with probability 1. So this definition is

957 impossible to satisfy.

963 The concept of targetted collision resistance is relatively new, but quite

965 functions. They're not really keyed, because the indices aren't kept

966 secret.

968 When making a signature, an index $i$ is chosen at random, and the

969 signature for message $m$ is formed over the pair $(i, H_i(M))$.

971 TCR-hash functions are the subject of ongoing research. No practical

972 designs exist at the moment.

978 Consider the following experiment:

982 $i \getsr \keys H$; \\

988 The security of a TCR-hash function is measured as:

990 where the maximum is taken over all adversaries running in time $t$. We

991 define $(t, \epsilon)$-security as usual.

997 In practice, we expect much more than just collision resistance from hash

998 functions: we expect a certain amount of `random' behaviour. But this is

999 hard to quantify.

1001 One approach is to model a hash function as a `random oracle', i.e., an

1002 oracle containing a function chosen at random, used by the construction

1003 under consideration, and made available to the adversary. The idea is that

1004 reductions can `capture' the knowledge of an adversary by examining the

1005 queries it makes to its random oracle.

1008 better than nothing.

1016 There are a number of `standard' assumptions that are made about the

1017 difficulty of various problems:

1019 \item IFP, the Integer Factorization Problem;

1020 \item QRP, the Quadratic Residuosity Problem;

1021 \item DLP, the Discrete Logarithm Problem;

1022 \item RSAP, the RSA Problem; and

1023 \item CDH, the Computational Diffie-Hellman problem and its variants

1030 \resetseq

1033 We often assume that large integers of the form $n = p q$, where $p$ and

1034 $q$ are primes of roughly the same size, are `hard' to factor.

1035 Specifically, there is no algorithm which will factor such an $n$ in time

1036 bounded by a polynomial function of $\log n$.

1038 The difficulty of various other problems, e.g., Quadratic Residuosity, or

1039 RSA, depend on the difficulty of factoring; however, it is not yet known

1040 whether the ability to solve QRP or RSAP can be used to factor.

1046 The problem of extracting square roots modulo $n = p q$ is provably as hard

1047 as factoring. This is the basis of Rabin's public key encryption and

1048 digital signature schemes. We shall analyse these later.

1050 Suppose $Q(n, y)$ is an algorithm which returns an $x$ such that $x^2

1052 nontrivial factor of $n$ as follows:

1058 $x' \gets Q(n, y)$; \\

1059 $p \gets \gcd(x + x', n)$; \\

1061 \FOREVER;

1066 The program attempts to find two square roots of $y$ mod $n$. It's easy to

1069 a factorization of $k n$. It remains to prove the probability bound on $x

1070 + x'$ being a nontrivial factor of $n$.

1074 $n$.

1076 Firstly, we claim that, if $p$ is an odd prime then the congruence $x^2

1084 There must exist odd primes $p$, $q$, such that $p|n$ and $q|n$, and $x

1088 since $q$ is odd. Hence, $p$ divides $x + y$, but $q$ does not, so $\gcd(x

1089 + y, n)$ is a nontrivial factor of $n$, completing the proof.

1101 whether $x$ is a quadratic residue mod $p$:

1104 0 & if $p$ divides $x$ \\

1105 -1 & if $x$ is a quadratic nonresidue mod $p$ \\

1106 +1 & if $x$ is a quadratic residue mod $p$

1113 This can be efficiently computed without knowledge of the factors of $n$

1122 quadratic residue or it might not; if not, then we say that $x$ is a

1126 at random, then

1128 since we have

1130 with each occurring with equal probability.

1132 The problem of distinguishing pseudosquares from quadratic residues is

1133 called the Quadratic Residuosity Problem (QRP). It is not known how to

1134 solve this problem without factoring $n$.

1141 The (Integer) Discrete Logarithm Problem asks for the solution $x$ given a

1143 difficult as factoring. (The ability to solve discrete logs modulo $n$ is

1144 sufficient to factor $n$. The best known algorithms for IFP and DLP have

1145 the same running time.)

1147 The problem generalizes to other cyclic groups, e.g., elliptic curves over

1148 finite fields.

1153 \resetseq

1156 The problems of square-root extraction, deciding quadratic residuosity, the

1157 RSA problem, and finding discrete logarithms share the property of being

1160 probability distribution of problem instances}, such that a solution to

1161 one of the derived instances yields a solution to the original one.

1163 This is a good property to have. It implies that `most' problem instances

1164 are as hard as the hardest instances.

1170 The RSA problem is to compute $e$-th roots modulo $n = p q$, where $e$ is

1171 relatively prime to $n$. Suppose that the algorithm $S(n, e, y)$ returns a

1173 the special symbol $\bot$ otherwise. The following probabilistic algorithm

1174 then solves the RSA problem for arbitrary $y$:

1181 $x \gets S(n, e, y')$; \\

1183 \FOREVER;

1198 probabilistic algorithm which solves the computational Diffie-Hellman

1199 problem in time polynomial in $\log q$.

1201 Obviously, being able to compute discrete logs in $G$ efficiently would

1202 yield a solution to the Diffie-Hellman problem. But it may be the case

1203 that the Diffie-Hellman problem is easier than the discrete log problem.

1205 The Diffie-Hellman problem is self-reducible.

1211 The computational Diffie-Hellman assumption makes a statement only about

1213 Diffie-Hellman is frequently used for key-exchange, what can we say about

1214 the ability of an adversary to guess any of the bits?

1216 The Decisional Diffie-Hellman (DDH) assumption asserts that, if you don't

1218 random element of $G$; that is, that the distributions of the following

1219 experiments are computationally indistinguishable:

1224 \next

1235 If $A$ is an algorithm attempting to solve DDH in the group $G$, then we

1236 write its advantage as

1245 and the insecurity of DDH in $G$ as

1247 with the maximum over all algorithms $A$ running in time $t$.

1253 If you can solve the computational Diffie-Hellman problem, you can solve

1254 the decisional version. If you can compute discrete logs, you can solve

1255 both.

1258 be) hard, but the decisional problem is easy. In particular, if the group

1259 order $q$ has small factors then the decisional problem isn't hard.

1262 \endinput

1264 %%% Local Variables:

1265 %%% mode: latex

1266 %%% TeX-master: "ips"

1267 %%% End: