Commit | Line | Data |
---|---|---|
2117e02e MW |
1 | Netstrings |
2 | D. J. Bernstein, djb@pobox.com | |
3 | 19970201 | |
4 | ||
5 | ||
6 | 1. Introduction | |
7 | ||
8 | A netstring is a self-delimiting encoding of a string. Netstrings are | |
9 | very easy to generate and to parse. Any string may be encoded as a | |
10 | netstring; there are no restrictions on length or on allowed bytes. | |
11 | Another virtue of a netstring is that it declares the string size up | |
12 | front. Thus an application can check in advance whether it has enough | |
13 | space to store the entire string. | |
14 | ||
15 | Netstrings may be used as a basic building block for reliable network | |
16 | protocols. Most high-level protocols, in effect, transmit a sequence | |
17 | of strings; those strings may be encoded as netstrings and then | |
18 | concatenated into a sequence of characters, which in turn may be | |
19 | transmitted over a reliable stream protocol such as TCP. | |
20 | ||
21 | Note that netstrings can be used recursively. The result of encoding | |
22 | a sequence of strings is a single string. A series of those encoded | |
23 | strings may in turn be encoded into a single string. And so on. | |
24 | ||
25 | In this document, a string of 8-bit bytes may be written in two | |
26 | different forms: as a series of hexadecimal numbers between angle | |
27 | brackets, or as a sequence of ASCII characters between double quotes. | |
28 | For example, <68 65 6c 6c 6f 20 77 6f 72 6c 64 21> is a string of | |
29 | length 12; it is the same as the string "hello world!". | |
30 | ||
31 | Although this document restricts attention to strings of 8-bit bytes, | |
32 | netstrings could be used with any 6-bit-or-larger character set. | |
33 | ||
34 | ||
35 | 2. Definition | |
36 | ||
37 | Any string of 8-bit bytes may be encoded as [len]":"[string]",". | |
38 | Here [string] is the string and [len] is a nonempty sequence of ASCII | |
39 | digits giving the length of [string] in decimal. The ASCII digits are | |
40 | <30> for 0, <31> for 1, and so on up through <39> for 9. Extra zeros | |
41 | at the front of [len] are prohibited: [len] begins with <30> exactly | |
42 | when [string] is empty. | |
43 | ||
44 | For example, the string "hello world!" is encoded as <31 32 3a 68 | |
45 | 65 6c 6c 6f 20 77 6f 72 6c 64 21 2c>, i.e., "12:hello world!,". The | |
46 | empty string is encoded as "0:,". | |
47 | ||
48 | [len]":"[string]"," is called a netstring. [string] is called the | |
49 | interpretation of the netstring. | |
50 | ||
51 | ||
52 | 3. Sample code | |
53 | ||
54 | The following C code starts with a buffer buf of length len and | |
55 | prints it as a netstring. | |
56 | ||
57 | if (printf("%lu:",len) < 0) barf(); | |
58 | if (fwrite(buf,1,len,stdout) < len) barf(); | |
59 | if (putchar(',') < 0) barf(); | |
60 | ||
61 | The following C code reads a netstring and decodes it into a | |
62 | dynamically allocated buffer buf of length len. | |
63 | ||
64 | if (scanf("%9lu",&len) < 1) barf(); /* >999999999 bytes is bad */ | |
65 | if (getchar() != ':') barf(); | |
66 | buf = malloc(len + 1); /* malloc(0) is not portable */ | |
67 | if (!buf) barf(); | |
68 | if (fread(buf,1,len,stdin) < len) barf(); | |
69 | if (getchar() != ',') barf(); | |
70 | ||
71 | Both of these code fragments assume that the local character set is | |
72 | ASCII, and that the relevant stdio streams are in binary mode. | |
73 | ||
74 | ||
75 | 4. Security considerations | |
76 | ||
77 | The famous Finger security hole may be blamed on Finger's use of the | |
78 | CRLF encoding. In that encoding, each string is simply terminated by | |
79 | CRLF. This encoding has several problems. Most importantly, it does | |
80 | not declare the string size in advance. This means that a correct | |
81 | CRLF parser must be prepared to ask for more and more memory as it is | |
82 | reading the string. In the case of Finger, a lazy implementor found | |
83 | this to be too much trouble; instead he simply declared a fixed-size | |
84 | buffer and used C's gets() function. The rest is history. | |
85 | ||
86 | In contrast, as the above sample code shows, it is very easy to | |
87 | handle netstrings without risking buffer overflow. Thus widespread | |
88 | use of netstrings may improve network security. |