+++ /dev/null
-Netstrings
-D. J. Bernstein, djb@pobox.com
-19970201
-
-
-1. Introduction
-
- A netstring is a self-delimiting encoding of a string. Netstrings are
- very easy to generate and to parse. Any string may be encoded as a
- netstring; there are no restrictions on length or on allowed bytes.
- Another virtue of a netstring is that it declares the string size up
- front. Thus an application can check in advance whether it has enough
- space to store the entire string.
-
- Netstrings may be used as a basic building block for reliable network
- protocols. Most high-level protocols, in effect, transmit a sequence
- of strings; those strings may be encoded as netstrings and then
- concatenated into a sequence of characters, which in turn may be
- transmitted over a reliable stream protocol such as TCP.
-
- Note that netstrings can be used recursively. The result of encoding
- a sequence of strings is a single string. A series of those encoded
- strings may in turn be encoded into a single string. And so on.
-
- In this document, a string of 8-bit bytes may be written in two
- different forms: as a series of hexadecimal numbers between angle
- brackets, or as a sequence of ASCII characters between double quotes.
- For example, <68 65 6c 6c 6f 20 77 6f 72 6c 64 21> is a string of
- length 12; it is the same as the string "hello world!".
-
- Although this document restricts attention to strings of 8-bit bytes,
- netstrings could be used with any 6-bit-or-larger character set.
-
-
-2. Definition
-
- Any string of 8-bit bytes may be encoded as [len]":"[string]",".
- Here [string] is the string and [len] is a nonempty sequence of ASCII
- digits giving the length of [string] in decimal. The ASCII digits are
- <30> for 0, <31> for 1, and so on up through <39> for 9. Extra zeros
- at the front of [len] are prohibited: [len] begins with <30> exactly
- when [string] is empty.
-
- For example, the string "hello world!" is encoded as <31 32 3a 68
- 65 6c 6c 6f 20 77 6f 72 6c 64 21 2c>, i.e., "12:hello world!,". The
- empty string is encoded as "0:,".
-
- [len]":"[string]"," is called a netstring. [string] is called the
- interpretation of the netstring.
-
-
-3. Sample code
-
- The following C code starts with a buffer buf of length len and
- prints it as a netstring.
-
- if (printf("%lu:",len) < 0) barf();
- if (fwrite(buf,1,len,stdout) < len) barf();
- if (putchar(',') < 0) barf();
-
- The following C code reads a netstring and decodes it into a
- dynamically allocated buffer buf of length len.
-
- if (scanf("%9lu",&len) < 1) barf(); /* >999999999 bytes is bad */
- if (getchar() != ':') barf();
- buf = malloc(len + 1); /* malloc(0) is not portable */
- if (!buf) barf();
- if (fread(buf,1,len,stdin) < len) barf();
- if (getchar() != ',') barf();
-
- Both of these code fragments assume that the local character set is
- ASCII, and that the relevant stdio streams are in binary mode.
-
-
-4. Security considerations
-
- The famous Finger security hole may be blamed on Finger's use of the
- CRLF encoding. In that encoding, each string is simply terminated by
- CRLF. This encoding has several problems. Most importantly, it does
- not declare the string size in advance. This means that a correct
- CRLF parser must be prepared to ask for more and more memory as it is
- reading the string. In the case of Finger, a lazy implementor found
- this to be too much trouble; instead he simply declared a fixed-size
- buffer and used C's gets() function. The rest is history.
-
- In contrast, as the above sample code shows, it is very easy to
- handle netstrings without risking buffer overflow. Thus widespread
- use of netstrings may improve network security.