Benchmarks

ProbStructs ships a Google Benchmark suite that measures inc and get performance for all three data structures across a range of parameter sizes. Results are saved as JSON so you can track performance across commits.

Result storage

Results are kept in two separate directories so local and CI runs never mix:

benchmark_results/local/ — your local runs; gitignored (never committed)
benchmark_results/ci/ — CI runs committed to master; used as the regression baseline for pull requests

make bench-run and make bench-compare always target local/. The CI workflow saves to ci/ automatically.

Prerequisites

You need CMake 3.11+, a C++17-capable compiler, and uv (used to run the comparison and regression scripts with their dependencies fetched automatically).

Install uv: https://docs.astral.sh/uv/getting-started/installation/

The Makefile finds cmake automatically in PATH and the common Homebrew locations (/opt/homebrew/bin, /usr/local/bin).

macOS:

brew install cmake

Ubuntu/Debian:

# Kitware's APT repository provides a current release:
# https://apt.kitware.com/
sudo apt-get install cmake

Building

Benchmarks are off by default and live in their own CMake build directory _bench so they never slow down a normal debug or release build.

make bench-build

This runs cmake with -DCMAKE_BUILD_TYPE=Release -DBUILD_BENCHMARKS=ON, fetches Google Benchmark via FetchContent, and compiles _bench/benchmarks/probstructs_benchmark.

Running

make bench-run

The script scripts/bench_run.sh executes the binary and writes a timestamped JSON file to benchmark_results/local/:

benchmark_results/local/2026-05-27_14-30-00.json

Local results are gitignored; they are only used for local comparisons. See Result storage above.

You can pass extra Google Benchmark flags after the target:

./scripts/bench_run.sh _bench/benchmarks/probstructs_benchmark \
    --benchmark_filter=CountMin \
    --benchmark_repetitions=5

Comparing runs

After two or more runs:

make bench-compare

This calls scripts/bench_compare.sh, which locates Google Benchmark’s tools/compare.py inside the _bench build directory and compares the two most-recent files in benchmark_results/local/ automatically. The required Python packages (numpy, scipy) are fetched via uv on the first run — no manual pip install needed.

To compare specific files:

./scripts/bench_compare.sh \
    benchmark_results/local/2026-05-20_10-00-00.json \
    benchmark_results/local/2026-05-27_14-30-00.json

The comparison output shows the relative change (+/-) for each benchmark, making regressions easy to spot.

What is benchmarked

Key domains

All benchmarks use pre-generated key pools and access sequences to avoid measuring string construction inside the timed loop:

Name	Keys	Purpose
Small	1 000	High collision rate; sketch data fits in L1/L2 cache
Medium	10 000	Moderate collisions; L2/L3 resident
Large	100 000	Low collision rate; exercises cache pressure

Access distributions

Uniform — every key equally likely (worst-case for accuracy, neutral for cache).
Zipf (s=1) — top 1 % of keys receive ~50 % of traffic. Realistic for web requests, event streams, and user activity logs. Hot keys stay warm in cache, which can reveal differences invisible under uniform load.

CountMinSketch

Benchmarks: Inc, Get, Mixed (80 % inc / 20 % get).

Parameters swept: sketch width (500 – 5 000), depth (3 – 7), key domain (1 K – 100 K), distribution (uniform / Zipf). The full sketch size (width × depth × 4 bytes) spans from well within L1 cache to L3-resident, making the parameter sweep a de-facto cache-pressure study.

ExponentialHistorgram

Benchmarks: Inc, Get.

The key variable is tick_step — how far the stream clock advances between calls. This controls how much internal bucket-cascade work each call triggers:

Mode	tick_step	Meaning
Steady	1	Minimal reshuffling — hot path for continuous streams
Bursty	window / 4	Several buckets shift per call
Expiry	window / 2	Majority of stored data expires on every call

Windows tested: 256, 1 024, 16 384.

ExponentialCountMinSketch

Benchmarks: Inc, Get, Mixed (80 % inc / 20 % get).

Parameters swept: sketch width (500 – 2 000), depth (3 – 5), window (256 – 4 096), tick_step (steady / bursty), key domain (1 K – 100 K), distribution (uniform / Zipf).

End-to-end scenarios

These simulate realistic workloads rather than isolating a single operation:

FrequencyEstimation — 90 % inserts / 10 % point queries over a 100 K Zipf key domain. Models event counting (URL hits, product views).

HeavyHitterDetection — Sliding-window frequency counting over a 10 K Zipf domain. Every 10 inserts the current key is queried; keys exceeding 1 % of the window are flagged. Reports heavy_hitters_pct as a counter.

RateLimiter — Per-client request rate checking against a 60-second sliding window. Each event is either counted (if under the rate limit) or dropped. Models API rate limiting and quota enforcement. Reports throttled_pct.

All benchmarks report items/s throughput in addition to wall time.

Continuous integration

The CI GitHub Actions workflow (.github/workflows/ci.yml) runs automatically on every pull request and push to master:

Unit tests — built and run on Ubuntu, macOS, and Windows. A failure blocks the PR.
Benchmark regression check — benchmarks are built and run on a fixed Ubuntu runner for consistency, with --benchmark_repetitions=3 so the comparison uses the statistical mean rather than a single sample. The latest JSON file committed to master (in benchmark_results/ci/) is used as the baseline. Two independent rules are evaluated:
- Per-benchmark rule — fails if 2 or more benchmarks are each more than 10 % slower. A single noisy outlier does not trigger this rule.
- Geomean rule — fails if the geometric mean of all cpu_time ratios regresses by more than 5 %. Catches broad slowdowns spread across many benchmarks that each stay under the per-benchmark threshold.
Either rule failing blocks the PR.
Baseline update — on pushes to master the new result file is automatically committed to benchmark_results/ci/ so future PRs always compare against up-to-date hardware measurements.

Note

GitHub Actions runners are virtualised. Timing can vary by ±5 % between runs. Three repetitions, the 10 % per-benchmark threshold, and the minimum-regressions=2 rule together filter out single-benchmark noise. If the check becomes flaky, increase --min-regressions or widen --threshold in the workflow file.

You can also run the regression check locally:

uv run scripts/bench_check_regression.py \
    --baseline benchmark_results/ci/2026-05-20_10-00-00.json \
    --current  benchmark_results/local/2026-05-27_14-30-00.json \
    --threshold 10 --min-regressions 2 --geomean-threshold 5

CMake option reference

BUILD_BENCHMARKS: Set to ON to include the benchmarks/ subdirectory. Default: OFF.
MODERN_CMAKE_BUILD_BENCHMARKS: Emergency override to enable benchmarks when ProbStructs is consumed via add_subdirectory from another project. Default: not set.