Benchmarks
==========

ProbStructs ships a `Google Benchmark`_ suite that measures ``inc`` and ``get``
performance for all three data structures across a range of parameter sizes.
Results are saved as JSON so you can track performance across commits.

Result storage
--------------

Results are kept in two separate directories so local and CI runs never mix:

* ``benchmark_results/local/`` — your local runs; **gitignored** (never committed)
* ``benchmark_results/ci/`` — CI runs committed to ``master``; used as the
  regression baseline for pull requests

``make bench-run`` and ``make bench-compare`` always target ``local/``.
The CI workflow saves to ``ci/`` automatically.

.. _Google Benchmark: https://github.com/google/benchmark

Prerequisites
-------------

You need **CMake 3.11+**, a C++17-capable compiler, and **uv** (used to run
the comparison and regression scripts with their dependencies fetched
automatically).

Install uv: https://docs.astral.sh/uv/getting-started/installation/

The Makefile finds cmake automatically in ``PATH`` and the common Homebrew
locations (``/opt/homebrew/bin``, ``/usr/local/bin``).

macOS:

.. code-block:: bash

    brew install cmake

Ubuntu/Debian:

.. code-block:: bash

    # Kitware's APT repository provides a current release:
    # https://apt.kitware.com/
    sudo apt-get install cmake

Building
--------

Benchmarks are **off by default** and live in their own CMake build directory
``_bench`` so they never slow down a normal debug or release build.

.. code-block:: bash

    make bench-build

This runs ``cmake`` with ``-DCMAKE_BUILD_TYPE=Release -DBUILD_BENCHMARKS=ON``,
fetches Google Benchmark via ``FetchContent``, and compiles
``_bench/benchmarks/probstructs_benchmark``.

Running
-------

.. code-block:: bash

    make bench-run

The script ``scripts/bench_run.sh`` executes the binary and writes a
timestamped JSON file to ``benchmark_results/local/``::

    benchmark_results/local/2026-05-27_14-30-00.json

Local results are gitignored; they are only used for local comparisons.
See *Result storage* above.

You can pass extra Google Benchmark flags after the target:

.. code-block:: bash

    ./scripts/bench_run.sh _bench/benchmarks/probstructs_benchmark \
        --benchmark_filter=CountMin \
        --benchmark_repetitions=5

Comparing runs
--------------

After two or more runs:

.. code-block:: bash

    make bench-compare

This calls ``scripts/bench_compare.sh``, which locates Google Benchmark's
``tools/compare.py`` inside the ``_bench`` build directory and compares the
two most-recent files in ``benchmark_results/local/`` automatically.
The required Python packages (``numpy``, ``scipy``) are fetched via **uv**
on the first run — no manual ``pip install`` needed.

To compare specific files:

.. code-block:: bash

    ./scripts/bench_compare.sh \
        benchmark_results/local/2026-05-20_10-00-00.json \
        benchmark_results/local/2026-05-27_14-30-00.json

The comparison output shows the relative change (``+``/``-``) for each
benchmark, making regressions easy to spot.

What is benchmarked
-------------------

Key domains
~~~~~~~~~~~

All benchmarks use pre-generated key pools and access sequences to avoid
measuring string construction inside the timed loop:

.. list-table::
   :header-rows: 1
   :widths: 20 15 65

   * - Name
     - Keys
     - Purpose
   * - Small
     - 1 000
     - High collision rate; sketch data fits in L1/L2 cache
   * - Medium
     - 10 000
     - Moderate collisions; L2/L3 resident
   * - Large
     - 100 000
     - Low collision rate; exercises cache pressure

Access distributions
~~~~~~~~~~~~~~~~~~~~

* **Uniform** — every key equally likely (worst-case for accuracy, neutral for cache).
* **Zipf (s=1)** — top 1 % of keys receive ~50 % of traffic.  Realistic for
  web requests, event streams, and user activity logs.  Hot keys stay warm in
  cache, which can reveal differences invisible under uniform load.

CountMinSketch
~~~~~~~~~~~~~~

Benchmarks: ``Inc``, ``Get``, ``Mixed`` (80 % inc / 20 % get).

Parameters swept: sketch width (500 – 5 000), depth (3 – 7), key domain
(1 K – 100 K), distribution (uniform / Zipf).  The full sketch size
(``width × depth × 4`` bytes) spans from well within L1 cache to L3-resident,
making the parameter sweep a de-facto cache-pressure study.

ExponentialHistorgram
~~~~~~~~~~~~~~~~~~~~~

Benchmarks: ``Inc``, ``Get``.

The key variable is **tick_step** — how far the stream clock advances between
calls.  This controls how much internal bucket-cascade work each call triggers:

.. list-table::
   :header-rows: 1
   :widths: 20 20 60

   * - Mode
     - tick_step
     - Meaning
   * - Steady
     - 1
     - Minimal reshuffling — hot path for continuous streams
   * - Bursty
     - window / 4
     - Several buckets shift per call
   * - Expiry
     - window / 2
     - Majority of stored data expires on every call

Windows tested: 256, 1 024, 16 384.

ExponentialCountMinSketch
~~~~~~~~~~~~~~~~~~~~~~~~~

Benchmarks: ``Inc``, ``Get``, ``Mixed`` (80 % inc / 20 % get).

Parameters swept: sketch width (500 – 2 000), depth (3 – 5), window
(256 – 4 096), tick_step (steady / bursty), key domain (1 K – 100 K),
distribution (uniform / Zipf).

End-to-end scenarios
~~~~~~~~~~~~~~~~~~~~

These simulate realistic workloads rather than isolating a single operation:

**FrequencyEstimation** — 90 % inserts / 10 % point queries over a 100 K Zipf
key domain.  Models event counting (URL hits, product views).

**HeavyHitterDetection** — Sliding-window frequency counting over a 10 K Zipf
domain.  Every 10 inserts the current key is queried; keys exceeding 1 % of
the window are flagged.  Reports ``heavy_hitters_pct`` as a counter.

**RateLimiter** — Per-client request rate checking against a 60-second sliding
window.  Each event is either counted (if under the rate limit) or dropped.
Models API rate limiting and quota enforcement.  Reports ``throttled_pct``.

All benchmarks report **items/s** throughput in addition to wall time.

Continuous integration
----------------------

The ``CI`` GitHub Actions workflow (``.github/workflows/ci.yml``) runs
automatically on every pull request and push to ``master``:

* **Unit tests** — built and run on Ubuntu, macOS, and Windows.  A failure
  blocks the PR.

* **Benchmark regression check** — benchmarks are built and run on a fixed
  Ubuntu runner for consistency, with ``--benchmark_repetitions=3`` so the
  comparison uses the statistical mean rather than a single sample.  The
  latest JSON file committed to ``master`` (in ``benchmark_results/ci/``) is
  used as the baseline.  Two independent rules are evaluated:

  * **Per-benchmark rule** — fails if **2 or more** benchmarks are each more
    than **10 %** slower.  A single noisy outlier does not trigger this rule.
  * **Geomean rule** — fails if the geometric mean of all cpu_time ratios
    regresses by more than **5 %**.  Catches broad slowdowns spread across
    many benchmarks that each stay under the per-benchmark threshold.

  Either rule failing blocks the PR.

* **Baseline update** — on pushes to ``master`` the new result file is
  automatically committed to ``benchmark_results/ci/`` so future PRs always
  compare against up-to-date hardware measurements.

.. note::

   GitHub Actions runners are virtualised.  Timing can vary by ±5 % between
   runs.  Three repetitions, the 10 % per-benchmark threshold, and the
   minimum-regressions=2 rule together filter out single-benchmark noise.
   If the check becomes flaky, increase ``--min-regressions`` or widen
   ``--threshold`` in the workflow file.

You can also run the regression check locally:

.. code-block:: bash

    uv run scripts/bench_check_regression.py \
        --baseline benchmark_results/ci/2026-05-20_10-00-00.json \
        --current  benchmark_results/local/2026-05-27_14-30-00.json \
        --threshold 10 --min-regressions 2 --geomean-threshold 5

CMake option reference
----------------------

``BUILD_BENCHMARKS``
    Set to ``ON`` to include the ``benchmarks/`` subdirectory.
    Default: ``OFF``.

``MODERN_CMAKE_BUILD_BENCHMARKS``
    Emergency override to enable benchmarks when ProbStructs is consumed
    via ``add_subdirectory`` from another project.
    Default: not set.