Benchmarks

ProbStructs ships a Google Benchmark suite that measures inc and get performance for all three data structures across a range of parameter sizes. Results are saved as JSON so you can track performance across commits.

Result storage

Results are kept in two separate directories so local and CI runs never mix:

  • benchmark_results/local/ — your local runs; gitignored (never committed)

  • benchmark_results/ci/ — CI runs committed to master; used as the regression baseline for pull requests

make bench-run and make bench-compare always target local/. The CI workflow saves to ci/ automatically.

Prerequisites

You need CMake 3.11+, a C++17-capable compiler, and uv (used to run the comparison and regression scripts with their dependencies fetched automatically).

Install uv: https://docs.astral.sh/uv/getting-started/installation/

The Makefile finds cmake automatically in PATH and the common Homebrew locations (/opt/homebrew/bin, /usr/local/bin).

macOS:

brew install cmake

Ubuntu/Debian:

# Kitware's APT repository provides a current release:
# https://apt.kitware.com/
sudo apt-get install cmake

Building

Benchmarks are off by default and live in their own CMake build directory _bench so they never slow down a normal debug or release build.

make bench-build

This runs cmake with -DCMAKE_BUILD_TYPE=Release -DBUILD_BENCHMARKS=ON, fetches Google Benchmark via FetchContent, and compiles _bench/benchmarks/probstructs_benchmark.

Running

make bench-run

The script scripts/bench_run.sh executes the binary and writes a timestamped JSON file to benchmark_results/local/:

benchmark_results/local/2026-05-27_14-30-00.json

Local results are gitignored; they are only used for local comparisons. See Result storage above.

You can pass extra Google Benchmark flags after the target:

./scripts/bench_run.sh _bench/benchmarks/probstructs_benchmark \
    --benchmark_filter=CountMin \
    --benchmark_repetitions=5

Comparing runs

After two or more runs:

make bench-compare

This calls scripts/bench_compare.sh, which locates Google Benchmark’s tools/compare.py inside the _bench build directory and compares the two most-recent files in benchmark_results/local/ automatically. The required Python packages (numpy, scipy) are fetched via uv on the first run — no manual pip install needed.

To compare specific files:

./scripts/bench_compare.sh \
    benchmark_results/local/2026-05-20_10-00-00.json \
    benchmark_results/local/2026-05-27_14-30-00.json

The comparison output shows the relative change (+/-) for each benchmark, making regressions easy to spot.

What is benchmarked

Key domains

All benchmarks use pre-generated key pools and access sequences to avoid measuring string construction inside the timed loop:

Name

Keys

Purpose

Small

1 000

High collision rate; sketch data fits in L1/L2 cache

Medium

10 000

Moderate collisions; L2/L3 resident

Large

100 000

Low collision rate; exercises cache pressure

Access distributions

  • Uniform — every key equally likely (worst-case for accuracy, neutral for cache).

  • Zipf (s=1) — top 1 % of keys receive ~50 % of traffic. Realistic for web requests, event streams, and user activity logs. Hot keys stay warm in cache, which can reveal differences invisible under uniform load.

CountMinSketch

Benchmarks: Inc, Get, Mixed (80 % inc / 20 % get).

Parameters swept: sketch width (500 – 5 000), depth (3 – 7), key domain (1 K – 100 K), distribution (uniform / Zipf). The full sketch size (width × depth × 4 bytes) spans from well within L1 cache to L3-resident, making the parameter sweep a de-facto cache-pressure study.

ExponentialHistorgram

Benchmarks: Inc, Get.

The key variable is tick_step — how far the stream clock advances between calls. This controls how much internal bucket-cascade work each call triggers:

Mode

tick_step

Meaning

Steady

1

Minimal reshuffling — hot path for continuous streams

Bursty

window / 4

Several buckets shift per call

Expiry

window / 2

Majority of stored data expires on every call

Windows tested: 256, 1 024, 16 384.

ExponentialCountMinSketch

Benchmarks: Inc, Get, Mixed (80 % inc / 20 % get).

Parameters swept: sketch width (500 – 2 000), depth (3 – 5), window (256 – 4 096), tick_step (steady / bursty), key domain (1 K – 100 K), distribution (uniform / Zipf).

End-to-end scenarios

These simulate realistic workloads rather than isolating a single operation:

FrequencyEstimation — 90 % inserts / 10 % point queries over a 100 K Zipf key domain. Models event counting (URL hits, product views).

HeavyHitterDetection — Sliding-window frequency counting over a 10 K Zipf domain. Every 10 inserts the current key is queried; keys exceeding 1 % of the window are flagged. Reports heavy_hitters_pct as a counter.

RateLimiter — Per-client request rate checking against a 60-second sliding window. Each event is either counted (if under the rate limit) or dropped. Models API rate limiting and quota enforcement. Reports throttled_pct.

All benchmarks report items/s throughput in addition to wall time.

Continuous integration

The CI GitHub Actions workflow (.github/workflows/ci.yml) runs automatically on every pull request and push to master:

  • Unit tests — built and run on Ubuntu, macOS, and Windows. A failure blocks the PR.

  • Benchmark regression check — benchmarks are built and run on a fixed Ubuntu runner for consistency, with --benchmark_repetitions=3 so the comparison uses the statistical mean rather than a single sample. The latest JSON file committed to master (in benchmark_results/ci/) is used as the baseline. Two independent rules are evaluated:

    • Per-benchmark rule — fails if 2 or more benchmarks are each more than 10 % slower. A single noisy outlier does not trigger this rule.

    • Geomean rule — fails if the geometric mean of all cpu_time ratios regresses by more than 5 %. Catches broad slowdowns spread across many benchmarks that each stay under the per-benchmark threshold.

    Either rule failing blocks the PR.

  • Baseline update — on pushes to master the new result file is automatically committed to benchmark_results/ci/ so future PRs always compare against up-to-date hardware measurements.

Note

GitHub Actions runners are virtualised. Timing can vary by ±5 % between runs. Three repetitions, the 10 % per-benchmark threshold, and the minimum-regressions=2 rule together filter out single-benchmark noise. If the check becomes flaky, increase --min-regressions or widen --threshold in the workflow file.

You can also run the regression check locally:

uv run scripts/bench_check_regression.py \
    --baseline benchmark_results/ci/2026-05-20_10-00-00.json \
    --current  benchmark_results/local/2026-05-27_14-30-00.json \
    --threshold 10 --min-regressions 2 --geomean-threshold 5

CMake option reference

BUILD_BENCHMARKS

Set to ON to include the benchmarks/ subdirectory. Default: OFF.

MODERN_CMAKE_BUILD_BENCHMARKS

Emergency override to enable benchmarks when ProbStructs is consumed via add_subdirectory from another project. Default: not set.