Benchmarks
ProbStructs ships a Google Benchmark suite that measures inc and get
performance for all three data structures across a range of parameter sizes.
Results are saved as JSON so you can track performance across commits.
Result storage
Results are kept in two separate directories so local and CI runs never mix:
benchmark_results/local/— your local runs; gitignored (never committed)benchmark_results/ci/— CI runs committed tomaster; used as the regression baseline for pull requests
make bench-run and make bench-compare always target local/.
The CI workflow saves to ci/ automatically.
Prerequisites
You need CMake 3.11+, a C++17-capable compiler, and uv (used to run the comparison and regression scripts with their dependencies fetched automatically).
Install uv: https://docs.astral.sh/uv/getting-started/installation/
The Makefile finds cmake automatically in PATH and the common Homebrew
locations (/opt/homebrew/bin, /usr/local/bin).
macOS:
brew install cmake
Ubuntu/Debian:
# Kitware's APT repository provides a current release:
# https://apt.kitware.com/
sudo apt-get install cmake
Building
Benchmarks are off by default and live in their own CMake build directory
_bench so they never slow down a normal debug or release build.
make bench-build
This runs cmake with -DCMAKE_BUILD_TYPE=Release -DBUILD_BENCHMARKS=ON,
fetches Google Benchmark via FetchContent, and compiles
_bench/benchmarks/probstructs_benchmark.
Running
make bench-run
The script scripts/bench_run.sh executes the binary and writes a
timestamped JSON file to benchmark_results/local/:
benchmark_results/local/2026-05-27_14-30-00.json
Local results are gitignored; they are only used for local comparisons. See Result storage above.
You can pass extra Google Benchmark flags after the target:
./scripts/bench_run.sh _bench/benchmarks/probstructs_benchmark \
--benchmark_filter=CountMin \
--benchmark_repetitions=5
Comparing runs
After two or more runs:
make bench-compare
This calls scripts/bench_compare.sh, which locates Google Benchmark’s
tools/compare.py inside the _bench build directory and compares the
two most-recent files in benchmark_results/local/ automatically.
The required Python packages (numpy, scipy) are fetched via uv
on the first run — no manual pip install needed.
To compare specific files:
./scripts/bench_compare.sh \
benchmark_results/local/2026-05-20_10-00-00.json \
benchmark_results/local/2026-05-27_14-30-00.json
The comparison output shows the relative change (+/-) for each
benchmark, making regressions easy to spot.
What is benchmarked
Key domains
All benchmarks use pre-generated key pools and access sequences to avoid measuring string construction inside the timed loop:
Name |
Keys |
Purpose |
|---|---|---|
Small |
1 000 |
High collision rate; sketch data fits in L1/L2 cache |
Medium |
10 000 |
Moderate collisions; L2/L3 resident |
Large |
100 000 |
Low collision rate; exercises cache pressure |
Access distributions
Uniform — every key equally likely (worst-case for accuracy, neutral for cache).
Zipf (s=1) — top 1 % of keys receive ~50 % of traffic. Realistic for web requests, event streams, and user activity logs. Hot keys stay warm in cache, which can reveal differences invisible under uniform load.
CountMinSketch
Benchmarks: Inc, Get, Mixed (80 % inc / 20 % get).
Parameters swept: sketch width (500 – 5 000), depth (3 – 7), key domain
(1 K – 100 K), distribution (uniform / Zipf). The full sketch size
(width × depth × 4 bytes) spans from well within L1 cache to L3-resident,
making the parameter sweep a de-facto cache-pressure study.
ExponentialHistorgram
Benchmarks: Inc, Get.
The key variable is tick_step — how far the stream clock advances between calls. This controls how much internal bucket-cascade work each call triggers:
Mode |
tick_step |
Meaning |
|---|---|---|
Steady |
1 |
Minimal reshuffling — hot path for continuous streams |
Bursty |
window / 4 |
Several buckets shift per call |
Expiry |
window / 2 |
Majority of stored data expires on every call |
Windows tested: 256, 1 024, 16 384.
ExponentialCountMinSketch
Benchmarks: Inc, Get, Mixed (80 % inc / 20 % get).
Parameters swept: sketch width (500 – 2 000), depth (3 – 5), window (256 – 4 096), tick_step (steady / bursty), key domain (1 K – 100 K), distribution (uniform / Zipf).
End-to-end scenarios
These simulate realistic workloads rather than isolating a single operation:
FrequencyEstimation — 90 % inserts / 10 % point queries over a 100 K Zipf key domain. Models event counting (URL hits, product views).
HeavyHitterDetection — Sliding-window frequency counting over a 10 K Zipf
domain. Every 10 inserts the current key is queried; keys exceeding 1 % of
the window are flagged. Reports heavy_hitters_pct as a counter.
RateLimiter — Per-client request rate checking against a 60-second sliding
window. Each event is either counted (if under the rate limit) or dropped.
Models API rate limiting and quota enforcement. Reports throttled_pct.
All benchmarks report items/s throughput in addition to wall time.
Continuous integration
The CI GitHub Actions workflow (.github/workflows/ci.yml) runs
automatically on every pull request and push to master:
Unit tests — built and run on Ubuntu, macOS, and Windows. A failure blocks the PR.
Benchmark regression check — benchmarks are built and run on a fixed Ubuntu runner for consistency, with
--benchmark_repetitions=3so the comparison uses the statistical mean rather than a single sample. The latest JSON file committed tomaster(inbenchmark_results/ci/) is used as the baseline. Two independent rules are evaluated:Per-benchmark rule — fails if 2 or more benchmarks are each more than 10 % slower. A single noisy outlier does not trigger this rule.
Geomean rule — fails if the geometric mean of all cpu_time ratios regresses by more than 5 %. Catches broad slowdowns spread across many benchmarks that each stay under the per-benchmark threshold.
Either rule failing blocks the PR.
Baseline update — on pushes to
masterthe new result file is automatically committed tobenchmark_results/ci/so future PRs always compare against up-to-date hardware measurements.
Note
GitHub Actions runners are virtualised. Timing can vary by ±5 % between
runs. Three repetitions, the 10 % per-benchmark threshold, and the
minimum-regressions=2 rule together filter out single-benchmark noise.
If the check becomes flaky, increase --min-regressions or widen
--threshold in the workflow file.
You can also run the regression check locally:
uv run scripts/bench_check_regression.py \
--baseline benchmark_results/ci/2026-05-20_10-00-00.json \
--current benchmark_results/local/2026-05-27_14-30-00.json \
--threshold 10 --min-regressions 2 --geomean-threshold 5
CMake option reference
BUILD_BENCHMARKSSet to
ONto include thebenchmarks/subdirectory. Default:OFF.MODERN_CMAKE_BUILD_BENCHMARKSEmergency override to enable benchmarks when ProbStructs is consumed via
add_subdirectoryfrom another project. Default: not set.