@@@ man wip
[mLib] / test / bench.3
CommitLineData
d056fbdf
MW
1.\" -*-nroff-*-
2.ie t .ds , \h'\w'\ 'u/2u'
3.el .ds , \ \"
4.TH bench 3 "9 March 2024" "Straylight/Edgeware" "mLib utilities library"
5.\" @bench_createtimer
6.\" @bench_init
7.\" @bench_destroy
8.\" @bench_calibrate
9.\" @bench_measure
10.
11.SH SYNOPSIS
12.nf
13.B "#include <mLib/bench.h>"
14.PP
15.ta 2n
16.B "struct bench_time {"
17.B " unsigned f;"
18.B " kludge64 s;"
19.B " uint32 ns;"
20.B " kludge64 cy;"
21.B "};"
22.PP
23.B "struct bench_timing {"
24.B " unsigned f;"
25.B " double n;"
26.B " double t;"
27.B " double cy;"
28.B "};"
29.PP
30.B "struct bench_timerops {"
31.BI " void (*describe)(struct bench_timer *" bt ", dstr *" d );
32.BI " void (*now)(struct bench_timer *" bt ", struct bench_time *" t_out );
33.BI " void (*destroy)(struct bench_timer *" bt );
34.B "};"
35.B "struct bench_timer {"
36.B " const struct bench_timerops *ops;"
37.B "};"
38.PP
39.B "struct bench_state {"
40.B " unsigned f;"
41.B " double target_s;"
42.B " ..."
43.B "}";
44.PP
45.BI "typedef void bench_fn(unsigned long " n ", void *" ctx );
46.PP
47.B "#define BTF_TIMEOK ..."
48.B "#define BTF_CYOK ..."
49.B "#define BTF_CLB ..."
50.B "#define BTF_ANY (BTF_TIMEOK | BTF_CYOK)"
51.PP
52.B "struct bench_timer *bench_createtimer(void);"
53.PP
54.BI "int bench_init(struct bench_state *" b ", struct bench_timer *" tm );
55.BI "void bench_destroy(struct bench_state *" b );
56.BI "int bench_calibrate(struct bench_state *" b );
57.ta \w'\fBint bench_measure('u
58.BI "int bench_measure(struct bench_state *" b ", struct bench_timing *" t_out ,
59.BI " double " base ", bench_fn *" fn ", void *" ctx );
60.fi
61.
62.SH DESCRIPTION
63The header file
64.B "<mLib/bench.h>"
65provides declarations and defintions
66for performing low-level benchmarks.
67.PP
68The `main event' is
69.BR bench_measure .
70This function will be described in detail later,
71but, in brief,
72it calls a caller-provided function,
73instructing it to run adaptively chosen numbers of iterations,
74in order to get a reasonably reliable measurement of its running time,
75and then reports its results by filling in a structure.
76.PP
77With understanding this function as our objective,
78we must examine all of the pieces involved in making it work.
79.
80.SS Timers in general
81A
82.I timer
83is a gadget which is capable of reporting the current time,
84in seconds (ideally precise to tiny fractions of a second),
85and/or in CPU cycles.
86A timer is represented by a pointer to an object of type
87.BR "struct bench_timer" .
88This structure has a single member,
89.BR ops ,
90pointing to a
91.BR "struct bench_timerops" ,
92which is a table of function pointers;
93typically, a timer has more data following this,
94but this fact is not exposed to applications.
95.PP
96The function pointers in
97.B "struct bench_timerops"
98are as follows.
99The first argument,
100named
101.I tm
102must always point to the timer object itself.
103.TP
104.IB tm ->ops->describe( tm ", " d)
105Write a description of the timer to the dynamic string
106.IR d .
107.TP
108.IB tm ->ops->now( tm ", " t_out)
109Store the current time in
110.IR t_out .
111The
112.B struct bench_time
113used to represent the time reported by a timer
114is described in detail below.
115.TP
116.IB tm ->ops->destroy( tm )
117Destroy the timer,
118releasing all of the resources that it holds.
119.PP
120A time, a reported by a timer, is represented by the
121.BR "struct bench_time" .
122A passage-of-time measurement is stored in the
123.B s
124and
125.B ns
126members, holding seconds and nanoseconds respectively.
127(A timer need not have nanosecond precision.
128The exact interpretation of the time \(en
129e.g., whether it measures wallclock time,
130user-mode CPU time,
131or total thread CPU time \(en
132is a matter for the specific timer implementation.)
133A cycle count is stored in the
134.B cy
135member.
136The
137.B f
138member stores flags:
139.B BTF_TIMEOK
140is set if the passage-of-time measurement
141.B s
142and
143.B ns
144are valid; and
145.B BTF_CYOK
146is set if the cycle count
147.B cy
148is valid.
149Neither the time nor the cycle count need be measured
150relative to any particular origin.
151The mask
152.B BTF_ANY
153covers the
154.B BTF_TIMEOK
155and
156.B BTF_CYOK
157bits:
158hence,
159.IB f &BTF_ANY
160is nonzero (true)
161if the timer returned any valid timing information.
162.
163.SS The built-in timer
164The function
165.B bench_createtimer
166constructs and returns a timer.
167It takes a single argument,
168a string
169.IR config ,
170from which it reads configuration information.
171If
172.B bench_createtimer
173fails, it returns a null pointer.
174.PP
175The
176.I config
177pointer may safely be null,
178in which case a default configuration will be used.
179Applications
180.I should only
181set this pointer to a value supplied by a user,
182e.g., through a command-line argument,
183environment variable, or
184configuration file.
185.PP
186The built-in timer makes use of one or two
187.IR subtimers :
188a `clock' subtimer to measure the passage of time,
189and possibly a `cycle' subtimer to count CPU cycles.
190.PP
191The configuration string consists of a sequence of words
192separated by whitespace.
193There may be additional whitespace at the start and end of the string.
194The words recognized are as follows.
195.TP
196.B list
197Prints a list of the available clock and cycle subtimers
198to standard output.
199.TP
200.BI clock= t , ...
201Use the first of the listed clock subtimers
202to initialize successfully
203as the clock subtimer.
204If none of the subtimers can be initialized,
205then construction of the timer as a whole fails.
206.TP
207.BI cycle= t , ...
208Use the first of the listed subtimers
209to initialize successfully
210as the cycle subtimer.
211If none of the subtimers can be initialized,
212then construction of the timer as a whole fails.
213.PP
214The clock subtimers are as follows.
215Not all of them will be available on every platform.
216.TP
217.B posix-thread-cputime
218Measures the passage of time using
219.BR clock_gettime (2),
220specifying the
221.B CLOCK_\%THREAD_\%CPUTIME_\%ID
222clock.
223.TP
224.B stdc-clock
225Measures the passage of time using
226.BR clock (3).
227Since
228.BR clock (3)
229is part of the original ANSI\ C standard,
230this subtimer should always be available.
231However, it may produce unhelpful results
232if other threads are running.
233.PP
234The cycle subtimers are as follows.
235Not all of them will be available on every platform.
236.TP
237.B linux-perf-event
238Counts CPU cycles using the Linux-specific
239.BR perf_event_open (2)
240function to read the
241.BR PERF_\%COUNT_\%HW_\%CPU_\%CYCLES
242counter.
243Only available on Linux.
244It will fail to initialize
245if access to performance counters is restricted,
246e.g., because the
247.B /proc/sys/kernel/perf_event_paranoid
248level is too high.
249.TP
250.B x86-rdtsc
251Counts CPU cycles using the x86
252.B rdtsc
253instruction.
254This instruction is not really suitable for performance measurement:
255it gives misleading results on CPUs with variable clock frequency.
256.TP
257.B null
258A dummy cycle counter,
259which will initialize successfully
260and then fail to report cycle counts.
261This is a reasonable fallback in many situations.
262.PP
263The built-in preference order for clock subtimers,
264from most to least preferred, is
265.B posix-thread-cputime
266followed by
267.BR stdc-clock .
268The built-in preference order for cycle subtimers,
269from most to least preferred, is
270.B linux-perf-event
271followed by
272.BR x86-rdtsc ,
273and then
274.BR null .
275.
276.SS The benchmark state
277A
278.I benchmark state
279tracks the information needed to measure performance of functions.
280It is represented by a
281.B struct bench_state
282structure.
283.PP
284The benchmark state is initialized by calling
285.BR bench_init ,
286passing the address of the state structure to be initialized,
287and a pointer to a timer.
288If
289.B bench_init
290is called with a non-null timer pointer,
291then it will not fail;
292the benchmark state will be initialized,
293and the function returns zero.
294If the timer pointer is null,
295then
296.B bench_init
297attempts to construct a timer for itself
298by calling
299.BR bench_createtimer .
300If this succeeds,
301then the benchmark state will be initialized,
302and the function returns zero.
303In both cases,
304the timer becomes owned by the benchmark state:
305calling
306.B bench_destroy
307on the benchmark state will destroy the timer.
308If
309.B bench_init
310is called with a null timer pointer,
311and its attempt to create a timer for itself fails,
312then
313.B bench_init
314returns \-1;
315the benchmark state is not initialized
316and can safely be discarded;
317calling
318safe to call
319.B bench_destroy
320on the unsuccessfully benchmark state is safe and has no effect.
321.PP
322Calling
323.B bench_destroy
324on a benchmark state
325releases any resources it holds,
326most notably its timer, if any.
327.PP
328Although
329.B struct bench_state
330is defined in the header file,
331only two members are available for use by applications.
332.TP
333.B f
334A word containing flags.
335.TP
336.B target_s
337The target time for which to try run a benchmark, in seconds.
338After initialization, this is set to 1.0,
339though applications can override it.
340.PP
341Before the benchmark state can be used in measurements,
342it must be
343.IR calibrated .
344This is performed by calling
345.B bench_calibrate
346on the benchmark state.
347Calibration takes a noticeable amount of time
348(currently about 0.25\*,s),
349so it makes sense to defer it until it's known to be necessary.
350.PP
351Calibration is carried out separately, but in parallel,
352for the timer's passage-of-time measurement and cycle counter.
353Either or both of these calibrations can succeed or fail;
354if passage-of-time calibration fails,
355then cycle count calibration is impossible.
356.PP
357When it completes,
358.B bench_calibrate
359sets flag in the benchmark state's
360.B f
361member:
362if passage-of-time calibration succeeded,
363.B BTF_TIMEOK
364is set;
365if cycle-count calibration succeeded,
366.B BTF_CYOK
367is set;
368and the flag
369.B BTF_CLB
370is set unconditionally,
371as a persistent indication that calibration has been attempted.
372.PP
373The
374.B bench_calibrate
375function returns zero if it successfully calibrated
376at least the passage-of-time measurement;
377otherwise, it returns \-1.
378If
379.B bench_calibrate
380is called for a second or subsequent time on the same benchmark state,
381it returns immediately,
382either returning 0 or \-1
383according to whether passage-of-time had previously been calibrated.
384.
385.SS Timing functions
386A
387.I benchmark function
388has the signature
389.IP
390.BI "void " fn "(unsigned long " n ", void *" ctx );
391.PP
392When called, it should perform the operation to be measured
393.I n
394times.
395The
396.I ctx
397argument is a pointer passed into
398.B bench_measure
399for the benchmark function's own purposes.
400.PP
401The function
402.B bench_measure
403receives five arguments.
404.TP
405.I b
406points to the benchmark state to be used.
407.TP
408.I t_out
409is the address of a
410.BR struct bench_timing
411in which the measurement should be left.
412This structure is described below.
413.TP
414.I base
415is a count of the number of operations performed
416by each iteration of the benchmark function.
417.TP
418.I fn
419is a benchmark function, described above.
420.TP
421.I ctx
422is a pointer to be passed to the benchmark function.
423.B bench_measure
424does not interpret this pointer in any way.
425.PP
426The
427.B bench_measure
428function calls its benchark function repeatedly
429with different iteration counts
430.IR n ,
431with the objective that the call take approximately
432.B target_s
433seconds, as established in the benchmark state.
434(Currently, if
435.B target_s
436holds the value
437.IR t ,
438then
439.B bench_measure
440is satisfied when a call takes at least
441.IR t /\(sr2\*,s.)
442Once the function finds a satisfactory number of iterations,
443it stores the results in
444.BI * t_out \fR.
445If measurement succeeds, then
446.B bench_measure
447returns zero.
448If it fails \(en
449most likely because the timer failed \(en
450then it returns \-1.
451.PP
452A
453.B bench_timing
454structure reports the outcome of a successful measurement.
455It has four members.
456.TP
457.B f
458A flags word.
459.B BTF_TIMEOK
460is set if the passage-of-time measurement in
461.B t
462is valid;
463.B BTF_CYOK
464is set if the cycle count in
465.B cy
466is valid.
467.TP
468.B n
469The number of iterations performed by the benchmark function
470on its satisfactory run,
471multiplied by
472.IR base .
473.TP
474.B t
475The time taken for the satisfactory run of the benchmark function,
476in seconds.
477Only valid if
478.B BTF_TIMEOK
479is set in
480.BR f .
481.TP
482.B cy
483The number of CPU cycles used
484in the satisfactory run of the benchmark function,
485in seconds.
486Only valid if
487.B BTF_CYOK
488is set in
489.BR f .
490.
491.SH "SEE ALSO"
492.BR mLib (3).
493.
494.SH AUTHOR
495Mark Wooding, <mdw@distorted.org.uk>