X-Git-Url: https://git.distorted.org.uk/~mdw/mLib/blobdiff_plain/c4ccbbf992e1587ae316f66af6a28006780688d8..6e683a79101025ee0d371f0b9bece811856edd8d:/test/bench.3.in diff --git a/test/bench.3.in b/test/bench.3.in index d5953ad..ef0b4db 100644 --- a/test/bench.3.in +++ b/test/bench.3.in @@ -45,11 +45,14 @@ bench \- low-level benchmarking tools .nf .B "#include " .PP -.ta 2n +.ta 2n +2n +2n .B "struct bench_time {" .B " unsigned f;" -.B " kludge64 s;" -.B " uint32 ns;" +.B " union {" +.B " struct { kludge64 s; uint32 ns; } ts;" +.B " clock_t clk;" +.B " kludge64 rawns;" +.B " } t;" .B " kludge64 cy;" .B "};" .PP @@ -60,9 +63,18 @@ bench \- low-level benchmarking tools .B " double cy;" .B "};" .PP +.B "#define BTF_T0 0u" +.B "#define BTF_T1 ..." .B "struct bench_timerops {" .BI " void (*describe)(struct bench_timer *" bt ", dstr *" d ); -.BI " void (*now)(struct bench_timer *" bt ", struct bench_time *" t_out ); +.ta 2n +\w'\fBint (*now)('u +.BI " int (*now)(struct bench_timer *" bt , +.BI " struct bench_time *" t_out ", unsigned " f ); +.ta 2n +\w'\void (*diff)('u +.BI " void (*diff)(struct bench_timer *" bt , +.BI " struct bench_timing *" delta_out , +.BI " const struct bench_time *" t0 , +.BI " const struct bench_time *" t1 ); .BI " void (*destroy)(struct bench_timer *" bt ); .B "};" .B "struct bench_timer {" @@ -140,49 +152,54 @@ must always point to the timer object itself. Write a description of the timer to the dynamic string .IR d . .TP -.IB tm ->ops->now( tm ", " t_out) +.IB tm ->ops->now( tm ", " t_out ", " f ) Store the current time in -.IR t_out . +.BI * t_out \fR. The -.B struct bench_time -used to represent the time reported by a timer -is described in detail below. +.B BTF_T1 +flag in +.I f +to indicate that this is the second call in a pair; +leave it clear for the first call. +(A fake +.B BTF_T0 +flag is defined to be zero for symmetry.) +Return zero on success +.I or +permanent failure; +return \-1 if timing failed but +trying again immediately has a reasonable chance of success. +.TP +.IB tm ->ops->diff( tm ", " delta_out ", " t0 ", " t1 ) +Store in +.BI * delta_out +the difference between the two times +.I t0 +and +.IR t1 . .TP .IB tm ->ops->destroy( tm ) Destroy the timer, releasing all of the resources that it holds. .PP -A time, a reported by a timer, is represented by the -.BR "struct bench_time" . -A passage-of-time measurement is stored in the -.B s -and -.B ns -members, holding seconds and nanoseconds respectively. -(A timer need not have nanosecond precision. -The exact interpretation of the time \(en -e.g., whether it measures wallclock time, -user-mode CPU time, -or total thread CPU time \(en -is a matter for the specific timer implementation.) -A cycle count is stored in the -.B cy -member. -The +A +.B bench_timing +structure reports the difference between two times, +as determined by a timer's +.B diff +function. +It has four members. +.TP .B f -member stores flags: +A flags word. .B BTF_TIMEOK -is set if the passage-of-time measurement -.B s -and -.B ns -are valid; and +is set if the passage-of-time measurement in +.B t +is valid; .B BTF_CYOK -is set if the cycle count +is set if the cycle count in .B cy is valid. -Neither the time nor the cycle count need be measured -relative to any particular origin. The mask .B BTF_ANY covers the @@ -191,9 +208,57 @@ and .B BTF_CYOK bits: hence, -.IB f &BTF_ANY +.B f&BTF_ANY is nonzero (true) if the timer returned any valid timing information. +.TP +.B n +The number of iterations performed by the benchmark function +on its satisfactory run, +multiplied by +.IR base . +.TP +.B t +The time taken for the satisfactory run of the benchmark function, +in seconds. +Only valid if +.B BTF_TIMEOK +is set in +.BR f . +.TP +.B cy +The number of CPU cycles used +in the satisfactory run of the benchmark function, +in seconds. +Only valid if +.B BTF_CYOK +is set in +.BR f . +.PP +A +.B "struct bench_time" +representats a single instant in time, +as captured by a timer's +.B now +function. +The use of this structure is a private matter for the timer: +the only hard requirement is that the +.B diff +function should be able to compute the difference between two times. +However, the intent is that +a passage-of-time measurement is stored in the +.B t +union, +a cycle count is stored in the +.B cy +member, and +the +.B f +member stores flags +.B BTF_TIMEOK +and or +.B BTF_CYOK +if the passage-of-time or cycle count respectively are valid. . .SS The built-in timer The function @@ -249,6 +314,10 @@ then construction of the timer as a whole fails. The clock subtimers are as follows. Not all of them will be available on every platform. .TP +.B linux-x86-perf-rdpmc-hw-cycles +This is a dummy companion to the similarly named cycle subtimer; +see its description below. +.TP .B posix-thread-cputime Measures the passage of time using .BR clock_gettime (2), @@ -269,8 +338,8 @@ if other threads are running. The cycle subtimers are as follows. Not all of them will be available on every platform. .TP -.B linux-perf-event -Counts CPU cycles using the Linux-specific +.B linux-perf-read-hw-cycles +Counts CPU cycles using the Linux-specific .BR perf_event_open (2) function to read the .BR PERF_\%COUNT_\%HW_\%CPU_\%CYCLES @@ -282,13 +351,48 @@ e.g., because the .B /proc/sys/kernel/perf_event_paranoid level is too high. .TP -.B x86-rdtsc -Counts CPU cycles using the x86 +.B linux-perf-rdpmc-hw-cycles +Counts CPU cycles using the Linux-specific +.BR perf_event_open (2) +function, +as for +.B linux-x86-perf-read-hw-cycles +above, +except that it additionally uses the i386/AMD64 .B rdtsc +and +.B rdpmc +instructions, +together with information provided by the kernel +through a memory-mapped page +to do its measurements without any system call overheads. +It does passage-of-time and cycle counting in a single operation, +so no separate clock subtimer is required: +the similarly-named clock subtimer does nothing +except check that the +.B linux-x86-perf-rdpmc-hw-cycles +cycle subtimer has been selected. +This is almost certainly the best choice if it's available. +.TP +.B x86-rdtscp +Counts CPU cycles using the x86 +.B rdtscp instruction. This instruction is not really suitable for performance measurement: it gives misleading results on CPUs with variable clock frequency. .TP +.B x86-rdtsc +Counts CPU cycles using the x86 +.B rdtsc +instruction. +This has the downsides of +.B rdtscp +above, +but also fails to detect when the thread has been suspended +or transferred to a different CPU core +and gives misleading answers in this case. +Not really recommended. +.TP .B null A dummy cycle counter, which will initialize successfully @@ -297,15 +401,21 @@ This is a reasonable fallback in many situations. .PP The built-in preference order for clock subtimers, from most to least preferred, is -.B posix-thread-cputime +.BR linux-x86-perf-rdpmc-hw-cycles , followed by +.BR posix-thread-cputime , +and finally .BR stdc-clock . The built-in preference order for cycle subtimers, from most to least preferred, is -.B linux-perf-event +.BR linux-x86-perf-rdpmc-hw-cycles +then +.BR linux-x86-perf-read-hw-cycles , followed by +.BR x86-rdtscp , +and .BR x86-rdtsc , -and then +and finally .BR null . . .SS The benchmark state @@ -483,45 +593,6 @@ returns zero. If it fails \(en most likely because the timer failed \(en then it returns \-1. -.PP -A -.B bench_timing -structure reports the outcome of a successful measurement. -It has four members. -.TP -.B f -A flags word. -.B BTF_TIMEOK -is set if the passage-of-time measurement in -.B t -is valid; -.B BTF_CYOK -is set if the cycle count in -.B cy -is valid. -.TP -.B n -The number of iterations performed by the benchmark function -on its satisfactory run, -multiplied by -.IR base . -.TP -.B t -The time taken for the satisfactory run of the benchmark function, -in seconds. -Only valid if -.B BTF_TIMEOK -is set in -.BR f . -.TP -.B cy -The number of CPU cycles used -in the satisfactory run of the benchmark function, -in seconds. -Only valid if -.B BTF_CYOK -is set in -.BR f . . .\"-------------------------------------------------------------------------- .SH "SEE ALSO"