9c7401f6aee41e1c7459b6dd9fa815d781ba2eb2

9 An elderly Frenchman rises every morning at 5, goes out to the street in

10 front of his house, and sprinkles a white powder up and down the street.

12 One day, a neighbour, who has watched his routine for many years, confronts

13 him. `What is this powder you sprinkle on the street every morning,

14 Pierre?'

17 elephants away.'

19 `But Pierre,' says the neighbour. `Everybody knows that there are no

20 elephants in France.'

22 Says the older man matter-of-factly, `I guess it must be working, then.'

25 The joke has its mandatory corresponding serious message. Cryptography is

26 like elephant powder. If it seems to work, and keeps the attackers -- our

27 elephants -- away, then we trust it. This isn't really very satisfactory.

28 When the elephant powder fails, a 2-ton elephant turns up, with three of its

29 chums in a Mini, and you notice it hiding upside-down in your bowl of

30 custard. Let's face it: elephants aren't good at being surreptitious.

32 But if your cryptography is no good, you may never know.

38 So, what can we do about the situation?

40 \item Design simple cryptographic primitives, e.g., block ciphers, hash

41 functions. Develop techniques for analysing them, and attempt to stay

42 ahead of the bad guy.

43 \item Build useful constructions from trusted primitives in a modular way.

46 Here we look at this second part of the approach.

55 \item Define notions of security. Now we know what to aim for, and what to

56 expect of our components.

57 \item Prove how the security of a construction relates to the security of

58 its primitives. If it breaks, we know who to blame.

68 adversary's output (for some input) follows a probability distribution. We

70 examine the probability with which this happens.

74 \item Oracles compute using secrets hidden from the adversary.

76 \item We can restrict the types of queries the adversary makes.

79 Oracles are written as superscripts. For example, an adversary given a

85 \resetseq

91 function of $k$.

94 We say that a function is (asymptotically) secure in some sense if, for any

95 polynomial $p(k)$ there is a negligible function $\nu(k)$ such that, for

96 any construction in the family, parameterized by $k$, no adversary which

97 runs for time $p(k)$ has success probability better than $\nu(k)$.

103 Suppose we build an encryption scheme from a one-way function. We'd like

104 to prove that the encryption is good if the one-way function is secure. We

105 do this by contradiction:

107 \item Suppose an adversary $A$ breaks the encryption scheme with

108 better-than-negligible probability.

110 to break the one-way function, in polynomial time, and with

111 better-than-negligible probability,

112 \item Claim that this violates the assumption of a secure one-way function.

115 This doesn't work with real constructions. We don't know where the

116 asymptotics set in, and they can conceal large constants. It's still

117 better than nothing.

124 We constrain the resources we allow the adversary:

126 \item Running time (including program size).

127 \item Number of oracle queries.

128 \item Maximum size of oracle queries.

131 which runs in time $t$ and makes $q$ oracle queries can break it with

132 probability better than $\epsilon$.

137 This is a much more satisfactory approach. However, we still have to

145 \resetseq

161 For our purposes, we can think of sets as being collections of objects.

163 We use the usual notations for set membership ($x \in X$), intersection ($X

168 items $f(x)$ for those $x$ for which the predicate $P(x)$ is true.

171 elements in the set.

173 The power set $\powerset(X)$ of a set $X$ is the set of all subsets of $X$.

177 exponents to indicate the product of a set with itself: hence, $X^2 = X

178 \times X$.

181 y)$ if $(x, y) \in R$. Relations between two sets are often written as

182 infix symbols: e.g., $x \sim y$.

188 In addition to strings, defined later, we use the following standard sets:

210 describes the domain of an arbitrary function; $\ran f$ describes its

211 range.

213 We sometimes apply the function notation to sets, indicating that the

235 We shall use this view in some of the algorithms we present. In addition,

237 ordered pair notation $(x, y)$ to reinforce the notion of a mapping.

239 We might write, for example,

243 to augment a function by the addition of a new mapping. This is clearly

244 only valid if $x \notin \dom f$ (or $f(x) = y$) initially.

254 Suppose $A$ is an alphabet. The set of sequences of exactly $n$ symbols

260 number $n$ where $a \in A^n$.

263 \cat y$, or sometimes just $x y$, is the result of writing the symbols in

264 $x$ followed by the symbols in $y$, in order. We have $|x y| = |x| + |y|$.

270 There are natural (injective) mappings between bit strings and natural

271 numbers.

274 it the natural number

277 The other natural mapping is

279 It doesn't matter which you choose, as long as you're consistent.

281 For simplicity's sake, we shall tend to switch between strings and the

282 numbers (and occasionally more exotic objects) they represent implicitly.

289 We'll find it useful to be able to break up strings in the algorithms we

290 present. We use the statement

294 to mean that the string $x$ is broken up into individual strings $x_0$,

295 $x_1$, \ldots, $x_k$, such that

300 We may omit one of the $n_i$ length indicators, since it can be deduced

301 from the length of the string $x$ and the other $n_j$.

313 We shall abuse set membership notation for vectors; i.e., we write $x \in

317 When we apply functions to vectors, we mean that they are applied

333 which occur with nonzero probability.

336 chosen independently at random, according to the distribution

351 \resetseq

355 distributions.

361 all possible values $x$ of $X$, we have

378 Fix a value of $k$. Again, let $X$ be distributed according to

394 We say that two families of distributions are computationally

395 indistinguishable if no `efficient' algorithm can tell them apart with

396 better than `negligible' probability.

403 any $k$:

406 Statistical closeness implies computational indistinguishability.

416 computationally indistinguishable.

423 Suppose we throw $q$ balls into $n$ bins at random (with $q \le n$). Let

428 previous one. Obviously, the worst case for this is when none of the other

429 balls have collided, so

431 Then

438 This is an extremely useful result, and we shall need it often.

445 \resetseq

448 Intuition: a one-way function is easy to compute but hard to invert.

451 Now we play a game:

454 \item Let $x'$ be the output of $I$ when run on input $y$.

455 \item We say that $I$ `wins' if $f(x') = y$; otherwise it loses.

457 Note that we don't care whether $x = x'$.

466 is

469 y \gets f(x);

470 x' \gets I(y) :

472 where the probability is taken over all choices of $x$ and all coin flips

473 made by $I$.

476 over possible inverters:

478 where the maximum is taken over all $I$ running in time $t$.

488 one-way function easy. This is most useful when the one-way function is a

490 T)$ is a triple of algorithms:

493 \item The probabilistic algorithm $G$ is given some parameter $k$ and

494 returns a pair $(P, K)$, containing the public parameters $P$ for

495 computing an instance of the one-way function, and the secret trapdoor

496 information $K$ for inverting it. We write $(P, K) \in G(k)$.

498 \item The algorithm $f$ implements the one-way function. That is, if $(P,

500 one-way function.

502 \item The algorithm $T$ inverts $f$ using the trapdoor information. That

503 is, if $(P, K) \in G(k)$ and $y = f_P(x)$ for some $x$, then $y =

512 A pseudorandom generator (PRG) `stretches' an input seed into a longer

513 string which `looks' random.

517 $G$ is:

525 where the maximum is taken over all distinguishers $D$ running in time

527 $(t, \epsilon)$-secure PRG\@.

532 $k \ge L$.

534 \item Show that trivial PRGs exist.

535 \item Show that if $g$ is nontrivial, then $g$ is also a one-way function,

536 with

541 \item The identity function $I(x) = x$ is a trivial PRG, with

545 is the output of $g$, then $A$ inverts $y$ with probability

549 security bound than this simplistic analysis.

557 $g_1(\cdot)$ be the second $k$ bits. Define the sequence of generators

583 Advantage is a concept used in many definitions of security notions:

591 \item Zero means that the adversary couldn't distinguish.

592 \item Negative advantage means the adversary got them the wrong way

593 around. There is another adversary which uses the same resources but has

594 positive advantage.

602 Let $b$ be the bit that $A$ returns, and let $b^*$ be the experiment's

603 hidden bit. Then

605 Addressing the above claims in order:

609 at most 1.

633 All present and correct.

646 We want to say that $F$ is a strong PRF if adversaries find it hard to

647 distinguish an instance $F_K$ from a function chosen completely at random

648 with the same `shape'.

651 selected $F_K$, or for completely random function, and ask it to try and

652 say which it is given.

661 We define the advantage of a distinguisher $D$ against the PRF $F$ as

662 follows:

668 The insecurity of the PRF is then measured as

670 where the maximum is taken over all distinguishers $D$ which run for time

688 The advantage of a distinguisher $D$ against the PRP $F$ is

696 PRFs, and the notion of $(t, q, \epsilon)$-security is the same.

703 the distinguisher is allowed to make inverse queries:

709 Since there are two oracles, we count queries to both when evaluating the

710 insecurity:

712 where the maximum is taken over all distinguishers $D$ which run for time

713 $t$, make $q$ queries to the standard oracle, and $q'$ queries to the

715 $F$ is a $(t, q, q', \epsilon)$-secure super PRP\@.

719 Note that the key length hasn't appeared anywhere in the definition of

720 insecurity for a PRP. Derive lower bounds for the insecurity of a PRP with

721 a $k$-bit key.

732 \resetseq

735 We model block ciphers as families of PRPs (not super PRPs). Most of the

736 analysis works best on PRFs, though. We show that a PRP makes a `pretty

737 good' PRF, as long as it's not over-used.

739 Let $F$ be any PRP family. Then

743 the block size -- then a PRP makes a good PRF.

746 meet it often when we examine modes of operation. We shall examine the

747 proof, because it illustrates some useful techniques.

753 This handy lemma states that the difference in the probability of some

754 outcome between the two games is bounded above by the probability that the

755 games differ.

763 We have:

768 Subtracting gives

771 as required.

779 permutation. We aim to show that $F$ is also a pseudorandom function.

781 Let $A$ be an adversary which distinguishes~$F$ from a pseudorandom

782 function in time~$t$, after making $q$ oracle queries. We consider a

784 the event that the adversary returns~$1$ at the end of the game.

786 Game~$\G0$ is the `random function' game. We given $A$ an oracle

789 Game~$\G1$ is the `PRF' game. We give $A$ an oracle which computes

792 By definition, then,

805 the event that this occurs.

808 must have

812 Using the earlier result on collisions, it's easy to see that

822 $F_2$ doesn't occur in $\G2$ (i.e., there is no collision), then the random

823 function is indistinguishable from a random permutation. So

826 By definition, we have

828 We can now tie all of this together.

834 A simple calculation shows that

842 In the second line, we used the bound we computed on the absolute

845 against a PRP; and in the fifth we used the definition of insecurity for a

846 PRP.

852 Finally, we imposed no restrictions on $A$, except that it run in time $t$

853 and make $q$ oracle queries. So our bound

856 the most successful adversary running with those resource bounds.

858 Hence, we can maximize, showing that

861 as required.

866 \resetseq

869 Hash functions like MD5 and SHA-1 are extremely useful primitives. What

870 properties do we expect of them? This out to be an extremely difficult

871 question to answer.

873 \item One-wayness. We've seen a definition for this already. But it's not

874 enough.

875 \item Collision-resistance. This is the property usually claimed as the

876 requirement for use in digital signature systems. We'll look at this

877 later.

878 \item Randomness. What does this mean, when the function is completely

879 public? A distinguishability criterion is clearly hopeless.

889 which transforms an input string $x$ as follows:

891 \item Pad $x$ to a multiple of $L$ bits in some injective way. Divide the

895 \item The result $H(x) = I_n$.

898 Suppose we have two strings $x \ne y$, such that $H(x) = H(y)$; i.e., a

901 vectors for hash functions have such obviously regular forms.)

906 the $l$-bit blocks of two distinct (padded) messages, and without loss

907 of generality suppose that $n \ge n'$. Let $I_0 = I'_0 = I$, let

911 We prove the result by induction on $n$. The case $n = 0$ is trivially

912 true, since there is only one zero-block message. Suppose, then, that the

913 result is true for $n$-block messages. There are three cases to consider.

918 messages, and because the remaining messages must differ in at least one

919 block, we can apply the inductive hypothesis on these shorter messages to

920 complete the proof.

926 The statement usually made about a `good' hash function $h$ is that it

927 should be `difficult' to find a collision: i.e., two preimages $x \ne y$

928 where $H(x) = H(y)$. How do we formalize this? Here's one attempt:

934 already `knows' the a collision for $H$ and just outputs the right answer.

935 It succeeds very quickly, and with probability 1. So this definition is

936 impossible to satisfy.

942 The concept of targetted collision resistance is relatively new, but quite

944 functions. They're not really keyed, because the indices aren't kept

945 secret.

947 When making a signature, an index $i$ is chosen at random, and the

948 signature for message $m$ is formed over the pair $(i, H_i(M))$.

950 TCR-hash functions are the subject of ongoing research. No practical

951 designs exist at the moment.

957 Consider the following experiment:

961 $i \getsr \keys H$; \\

967 The security of a TCR-hash function is measured as:

969 where the maximum is taken over all adversaries running in time $t$. We

970 define $(t, \epsilon)$-security as usual.

976 In practice, we expect much more than just collision resistance from hash

977 functions: we expect a certain amount of `random' behaviour. But this is

978 hard to quantify.

980 One approach is to model a hash function as a `random oracle', i.e., an

981 oracle containing a function chosen at random, used by the construction

982 under consideration, and made available to the adversary. The idea is that

983 reductions can `capture' the knowledge of an adversary by examining the

984 queries it makes to its random oracle.

987 better than nothing.

995 There are a number of `standard' assumptions that are made about the

996 difficulty of various problems:

998 \item IFP, the Integer Factorization Problem;

999 \item QRP, the Quadratic Residuosity Problem;

1000 \item DLP, the Discrete Logarithm Problem;

1001 \item RSAP, the RSA Problem; and

1002 \item CDH, the Computational Diffie-Hellman problem and its variants

1009 \resetseq

1012 We often assume that large integers of the form $n = p q$, where $p$ and

1013 $q$ are primes of roughly the same size, are `hard' to factor.

1014 Specifically, there is no algorithm which will factor such an $n$ in time

1015 bounded by a polynomial function of $\log n$.

1017 The difficulty of various other problems, e.g., Quadratic Residuosity, or

1018 RSA, depend on the difficulty of factoring; however, it is not yet known

1019 whether the ability to solve QRP or RSAP can be used to factor.

1025 The problem of extracting square roots modulo $n = p q$ is provably as hard

1026 as factoring. This is the basis of Rabin's public key encryption and

1027 digital signature schemes. We shall analyse these later.

1029 Suppose $Q(n, y)$ is an algorithm which returns an $x$ such that $x^2

1031 nontrivial factor of $n$ as follows:

1037 $x' \gets Q(n, y)$; \\

1038 $p \gets \gcd(x + x', n)$; \\

1040 \FOREVER;

1045 The program attempts to find two square roots of $y$ mod $n$. It's easy to

1048 a factorization of $k n$. It remains to prove the probability bound on $x

1049 + x'$ being a nontrivial factor of $n$.

1053 $n$.

1055 Firstly, we claim that, if $p$ is an odd prime then the congruence $x^2

1063 There must exist odd primes $p$, $q$, such that $p|n$ and $q|n$, and $x

1067 since $q$ is odd. Hence, $p$ divides $x + y$, but $q$ does not, so $\gcd(x

1068 + y, n)$ is a nontrivial factor of $n$, completing the proof.

1080 whether $x$ is a quadratic residue mod $p$:

1083 0 & if $p$ divides $x$ \\

1084 -1 & if $x$ is a quadratic nonresidue mod $p$ \\

1085 +1 & if $x$ is a quadratic residue mod $p$

1092 This can be efficiently computed without knowledge of the factors of $n$

1101 quadratic residue or it might not; if not, then we say that $x$ is a

1105 at random, then

1107 since we have

1109 with each occurring with equal probability.

1111 The problem of distinguishing pseudosquares from quadratic residues is

1112 called the Quadratic Residuosity Problem (QRP). It is not known how to

1113 solve this problem without factoring $n$.

1120 The (Integer) Discrete Logarithm Problem asks for the solution $x$ given a

1122 difficult as factoring. (The ability to solve discrete logs modulo $n$ is

1123 sufficient to factor $n$. The best known algorithms for IFP and DLP have

1124 the same running time.)

1126 The problem generalizes to other cyclic groups, e.g., elliptic curves over

1127 finite fields.

1132 \resetseq

1135 The problems of square-root extraction, deciding quadratic residuosity, the

1136 RSA problem, and finding discrete logarithms share the property of being

1139 probability distribution of problem instances}, such that a solution to

1140 one of the derived instances yields a solution to the original one.

1142 This is a good property to have. It implies that `most' problem instances

1143 are as hard as the hardest instances.

1149 The RSA problem is to compute $e$-th roots modulo $n = p q$, where $e$ is

1150 relatively prime to $n$. Suppose that the algorithm $S(n, e, y)$ returns a

1152 the special symbol $\bot$ otherwise. The following probabilistic algorithm

1153 then solves the RSA problem for arbitrary $y$:

1160 $x \gets S(n, e, y')$; \\

1162 \FOREVER;

1177 probabilistic algorithm which solves the computational Diffie-Hellman

1178 problem in time polynomial in $\log q$.

1180 Obviously, being able to compute discrete logs in $G$ efficiently would

1181 yield a solution to the Diffie-Hellman problem. But it may be the case

1182 that the Diffie-Hellman problem is easier than the discrete log problem.

1184 The Diffie-Hellman problem is self-reducible.

1190 The computational Diffie-Hellman assumption makes a statement only about

1192 Diffie-Hellman is frequently used for key-exchange, what can we say about

1193 the ability of an adversary to guess any of the bits?

1195 The Decisional Diffie-Hellman (DDH) assumption asserts that, if you don't

1197 random element of $G$; that is, that the distributions of the following

1198 experiments are computationally indistinguishable:

1203 \next

1214 If $A$ is an algorithm attempting to solve DDH in the group $G$, then we

1215 write its advantage as

1224 and the insecurity of DDH in $G$ as

1226 with the maximum over all algorithms $A$ running in time $t$.

1232 If you can solve the computational Diffie-Hellman problem, you can solve

1233 the decisional version. If you can compute discrete logs, you can solve

1234 both.

1237 be) hard, but the decisional problem is easy. In particular, if the group

1238 order $q$ has small factors then the decisional problem isn't hard.

1241 \endinput

1243 %%% Local Variables:

1244 %%% mode: latex

1245 %%% TeX-master: "ips"

1246 %%% End: