codec/codec.3: Minor formatting fix.
[mLib] / codec / codec.3
CommitLineData
236f657b
MW
1.\" -*-nroff-*-
2.TH codec 3 "9 January 2009" "Straylight/Edgeware" "mLib utilities library"
3.SH NAME
4codec \- binary encoding and decoding
5.\" @codec_class
6.\" @codec_strerror
7.\" @null_codec_class
8.\" @base64_class
9.\" @file64_class
10.\" @base64url_class
11.\" @base32_class
12.\" @base32hex_class
13.\" @hex_class
14.SH SYNOPSIS
15.nf
16.B "#include <mLib/codec.h>"
17.B "#include <mLib/base64.h>"
18.B "#include <mLib/base32.h>"
19.B "#include <mLib/hex.h>"
20
21.B "codec_class null_codec_class;"
22.B "codec_class base64_class, file64_class, base64url_class;"
23.B "codec_class base32_class, base32hex_class;"
24.B "codec_class hex_class;"
25
26.BI "const char *codec_strerror(int " err ");"
27.fi
28.SH DESCRIPTION
29The
30.B codec
31system provides an object-based interface to functions which encode
32binary data as plain text and decode the result to recover the original
33binary data. The interface makes it easy to support multiple encodings
34and select an appropriate one at runtime.
35.SS "The codec_class structure"
36The
37.B codec_class
38structure represents a particular encoding format. The structure has
39the following members.
40.TP
41.B "const char *name"
42The name of the class, as a null-terminated string. The name should not
43contain whitespace characters.
44.TP
45.BI "codec *(*encoder)(unsigned " flags ", const char *" indent ", unsigned " maxline ")"
46Pointer to a function which constructs a new encoder object, of type
47.BR codec .
48The
49.I flags
50configure the behaviour of the object; the
51.I indent
52string is written to separate lines of output; the integer
53.I maxline
54is the maximum length of line to be produced, or zero to forbid line
55breaking.
56.TP
57.BI "codec *(*decoder)(unsigned " flags ")"
58Pointer to a function which constructs a new decoder object, also of
59type
60.BR codec .
61The
62.I flags
63configure the behaviour of the object.
64.PP
65The
66.I flags
67to the
68.B encoder
69and
70.B decoder
71functions have the following meanings.
72.TP
73.B CDCF_LOWERC
74For codecs which produce output using a single alphabetic case (e.g.,
75.BR base32 ,
76.BR hex ),
77emit and accept only lower case; the default to emit and accept only
78upper case, for compatibility with RFC4648. If the codec usually
79produces mixed-case output, then this flag is ignored.
80.TP
81.B CDCF_IGNCASE
82For codecs which produce output using a single alphabetic case, ignore
83the case of the input when decoding. If the codec usually produces
84mixed-case output, then this flag is ignored.
85.TP
86.B CDCF_NOEQPAD
87For codecs which usually pad their output (e.g.,
88.BR base64 ,
89.BR base32 ),
90do not emit or accept padding characters. If the codec does not usually
91produce padding, or the padding is not redundant, then this flag is
92ignored.
93.TP
94.B CDCF_IGNEQPAD
95For codecs which usually pad their output, do not treat incorrect (e.g.,
96missing or excessive) padding as an error when decoding. If the codec
97does not usually produce padding, or the padding is required for
98unambiguous decoding, then this flag is ignored.
99.TP
100.B CDCF_IGNEQMID
101For codecs which usually pad their output, ignore padding characters
102wherever they may appear when decoding. Usually padding characters
103indicate the end of the input, and further input characters are
104considered erroneous. If the codec does not usually produce padding, or
105it is impossible to resume decoding correctly having seen padding
106characters, then this flag is ignored.
107.TP
108.B CDCF_IGNZPAD
109For codecs which need to pad their input, ignore unusual padding bits
110when decoding. (This is not at all the same thing as the padding
111characters controlled by the flags above: they deal with padding the
112length of the encoding
113.I output
114up to a suitable multiple of characters; this option deals with padding
115of the
116.I input
117prior to encoding.) If the codec does not add padding bits, or specific
118values are required for unambiguous decoding, then this flag is ignored.
119.TP
120.B CDCF_IGNNEWL
121Ignore newline (and carriage-return) characters when decoding: the
122default for RFC4648 codecs is to reject newline characters. If these
123characters are significant in the encoding, then this flag is ignored.
124.TP
09fbf4d0
MW
125.B CDCF_IGNSPC
126Ignore whitespace characters (other than newlines) when decoding: the
127default for RFC4648 codecs is to reject whitespace characters. If these
128characters are significant in the encoding, then this flag is ignored.
129.TP
236f657b
MW
130.B CDCF_IGNINVCH
131Ignore any other invalid characters appearing in the input when
132decoding.
133.TP
134.B CDCF_IGNJUNK
135Ignore all `junk' in the input. This should suppress almost all
136decoding errors.
137.PP
138If you do not set any of the
139.BR CDCF_IGN ...
140flags, a decoder should only accept the exact encoding that the
141corresponding encoder would produce (with
142.I maxline
143= 0 to inhibit line-breaking).
144.SS "The codec and codec_ops structures"
145The
146.B codec
147structure represents the state of an encoder or decoder, as returned by
148the
149.B encoder
150and
151.B decoder
152functions described above, contains a single member.
153.TP
154.B "const codec_ops *ops"
155Pointer to a
156.B codec_ops
157structure which contains operations and metadata for use with the
158encoder or decoder.
159.PP
160The
161.B codec_ops
162structure contains the following members.
163.TP
164.B "const codec_class *c"
165Pointer back to the
166.B codec_class
167which was used to construct the
168.B codec
169object.
170.TP
171.BI "int (*code)(codec *" c ", const void *" p ", size_t " sz ", dstr *" d ")"
172Encode or decode, using the codec
63ba7202 173.IR c ,
236f657b
MW
174the data in the buffer at address
175.I p
176and continuing for
177.I sz
178bytes, appending the output to the dynamic string
179.I d
180(see
181.BR dstr (3)).
182If the operation was successful, the function returns zero; otherwise it
183returns a nonzero error code, as described below.
184.TP
185.BI "void (*destroy)(codec *" c ")"
186Destroy the codec object
187.IR c ,
188freeing any resources it may hold.
189.PP
190A codec may buffer its input (e.g., if needs to see more in order to
191decide what output to produce next); it may also need to take special
192action at the end of the input (e.g., flushing buffers, and applying
193padding). To signal the codec that there is no more input, call the
194.B code
195function with a null
196.I p
197pointer. It will then write any final output to
198.IR d .
199.PP
200The following error conditions may be reported.
201.TP
202.B CDCERR_INVCH
203An invalid character was encountered while decoding. This includes
204encoutering padding characters if padding is disabled using the
205.B CDCF_NOEQPAD
206flag.
207.TP
208.B CDCERR_INVEQPAD
209Invalid padding characters (e.g., wrong characters, or too few, too
210many, or none at all) were found during decoding. This may also
211indicate that the input is truncated, even if the codec does not usually
212perform output padding.
213.TP
214.B CDCERR_INVZPAD
215Invalid padding bits were found during decoding.
216.PP
217The
218.B codec_strerror
219function converts these error codes to brief, (moderately)
220human-readable strings.
221.SS "Provided codecs"
222The library provides a number of standard codecs.
223.TP
224.B base64
225Implements Base64 encoding, as defined by RFC4648. Output is
226mixed-case, so the
227.B CDCF_LOWERC
228and
229.B CDCF_IGNCASE
230flags are ignored.
231.TP
232.B safe64
233Implements a variant of the Base64 encoding which uses
234.RB ` % '
235in place of
236.RB ` / ',
237so that its output is suitable for use as a Unix filename.
238.TP
239.B base64url
240Implements the filename- and URL-safe variant of Base64 encoding, as
241defined by RFC4648.
242.TP
243.B base32
244Implements Base32 encoding, as defined by RFC4648. Output is in upper
245case by default.
246.TP
247.B base32hex
248Implements the extended-hex variant of Base32, as defined by RFC4648.
249This encoding has the property that the encoding preserves the ordering
250of messages if padding is suppressed.
251.TP
252.B hex
253Implements hex encoding, defined by RFC4648 under the name Base16. For
254compatibility with that specification, output is in upper case by
255default.
256.SH "SEE ALSO"
257.BR bincode (1),
258.BR dstr (3),
259.BR mLib (3).
260.SH AUTHOR
261Mark Wooding, <mdw@distorted.org.uk>