Table of Contents
lmbench - system benchmarks
lmbench is a series of micro
benchmarks intended to measure basic operating system and hardware system
metrics. The benchmarks fall into three general classes: bandwidth, latency,
and ‘‘other’’.
Most of the lmbench benchmarks use a standard timing harness
described in timing(3)
and have a few standard options: parallelism, warmup,
and repetitions. Parallelism specifies the number of benchmark processes
to run in parallel. This is primarily useful when measuring the performance
of SMP or distributed computers and can be used to evaluate the system’s
performance scalability. Warmup is the number of minimum number of microseconds
the benchmark should execute the benchmarked capability before it begins
measuring performance. Again this is primarily useful for SMP or distributed
systems and it is intended to give the process scheduler time to "settle"
and migrate processes to other processors. By measuring performance over
various warmup periods, users may evaulate the scheduler’s responsiveness.
Repetitions is the number of measurements that the benchmark should take.
This allows lmbench to provide greater or lesser statistical strength
to the results it reports. The default number of repetitions is 11.
Data movement is fundemental to the performance on most computer
systems. The bandwidth measurements are intended to show how the system
can move data. The results of the bandwidth metrics can be compared but
care must be taken to understand what it is that is being compared. The
bandwidth benchmarks can be reduced to two main components: operating system
overhead and memory speeds. The bandwidth benchmarks report their results
as megabytes moved per second but please note that the data moved is not
necessarily the same as the memory bandwidth used to move the data. Consult
the individual man pages for more information.
Each of the bandwidth benchmarks
is listed below with a brief overview of the intent of the benchmark.
- bw_file_rd
- reading and summing of a file via the read(2)
interface.
- bw_mem_cp
- memory
copy.
- bw_mem_rd
- memory reading and summing.
- bw_mem_wr
- memory writing.
- bw_mmap_rd
- reading and summing of a file via the memory mapping mmap(2)
interface.
- bw_pipe
- reading of data via a pipe.
- bw_tcp
- reading of data via a TCP/IP
socket.
- bw_unix
- reading data from a UNIX socket.
Control
messages are also fundemental to the performance on most computer systems.
The latency measurements are intended to show how fast a system can be
told to do some operation. The results of the latency metrics can be compared
to each other for the most part. In particular, the pipe, rpc, tcp, and
udp transactions are all identical benchmarks carried out over different
system abstractions.
Latency numbers here should mostly be in microseconds
per operation.
- lat_connect
- the time it takes to establish a TCP/IP connection.
- lat_ctx
- context switching; the number and size of processes is varied.
- lat_fcntl
- fcntl file locking.
- lat_fifo
- ‘‘hot potato’’ transaction through a UNIX FIFO.
- lat_fs
- creating and deleting small files.
- lat_pagefault
- the time it takes
to fault in a page from a file.
- lat_mem_rd
- memory read latency (accurate
to the ~2-5 nanosecond range, reported in nanoseconds).
- lat_mmap
- time to
set up a memory mapping.
- lat_ops
- basic processor operations, such as integer
XOR, ADD, SUB, MUL, DIV, and MOD, and float ADD, MUL, DIV, and double ADD,
MUL, DIV.
- lat_pipe
- ‘‘hot potato’’ transaction through a Unix pipe.
- lat_proc
- process
creation times (various sorts).
- lat_rpc
- ‘‘hot potato’’ transaction through Sun
RPC over UDP or TCP.
- lat_select
- select latency
- lat_sig
- signal installation
and catch latencies. Also protection fault signal latency.
- lat_syscall
- non
trivial entry into the system.
- lat_tcp
- ‘‘hot potato’’ transaction through TCP.
- lat_udp
- ‘‘hot potato’’ transaction through UDP.
- lat_unix
- ‘‘hot potato’’ transaction
through UNIX sockets.
- lat_unix_connect
- the time it takes to establish a
UNIX socket connection.
- mhz
- processor cycle time
- tlb
- TLB
size and TLB miss latency
- line
- cache line size (in bytes)
- cache
- cache statistics,
such as line size, cache sizes, memory parallelism.
- stream
- John McCalpin’s
stream benchmark
- par_mem
- memory subsystem parallelism. How many requests
can the memory subsystem service in parallel, which may depend on the location
of the data in the memory hierarchy.
- par_ops
- basic processor operation
parallelism.
bargraph(1)
, graph(1)
, lmbench(3)
, results(3)
, timing(3)
,
bw_file_rd(8)
, bw_mem_cp(8)
, bw_mem_wr(8)
, bw_mmap_rd(8)
, bw_pipe(8)
,
bw_tcp(8)
, bw_unix(8)
, lat_connect(8)
, lat_ctx(8)
, lat_fcntl(8)
, lat_fifo(8)
,
lat_fs(8)
, lat_http(8)
, lat_mem_rd(8)
, lat_mmap(8)
, lat_ops(8)
, lat_pagefault(8)
,
lat_pipe(8)
, lat_proc(8)
, lat_rpc(8)
, lat_select(8)
, lat_sig(8)
, lat_syscall(8)
,
lat_tcp(8)
, lat_udp(8)
, lmdd(8)
, par_ops(8)
, par_mem(8)
, mhz(8)
, tlb(8)
,
line(8)
, cache(8)
, stream(8)
Funding for the development
of these tools was provided by Sun Microsystems Computer Corporation.
A
large number of people have contributed to the testing and development
of lmbench.
The benchmarking code is distributed under the GPL with
additional restrictions, see the COPYING file.
Carl Staelin and Larry
McVoy
Comments, suggestions, and bug reports are always welcome.
Table of Contents