Benchmark Report

EnforceCore includes a reproducible benchmark suite that measures per-component latency with statistical rigour. All results are generated deterministically from in-memory workloads (no network, no disk I/O in the hot path) and include warmup phases to eliminate cold-start bias.

Methodology

Parameter	Value
Warmup iterations	100 (not timed)
Timed iterations	1 000 per benchmark
Percentiles	P50, P95, P99, P99.9
Statistical measures	Mean, Median, Std Dev, Min, Max
Clock	`time.perf_counter()` (nanosecond resolution)
Environment	Single-threaded, GC enabled, no external services

Each benchmark:

Warmup — runs the function 100 times to populate caches and trigger any JIT-level optimisations in the interpreter.
Timed loop — records wall-clock time for each of the 1 000 iterations.
Statistics — computes percentiles via linear interpolation, standard deviation, and ops/second from the raw latency array.

Benchmarks

Core Component Benchmarks

Benchmark	Description
`policy_pre_call`	Pre-call policy evaluation (3-tool allowlist, 2-tool denylist)
`policy_post_call`	Post-call policy evaluation (output size check)
`pii_redaction`	PII redaction — email + phone in ~130-char input
`pii_clean_text`	PII scan on text with no PII (fast-path)
`pii_long_text`	PII redaction on ~2 KB text with scattered entities
`audit_record`	Write one Merkle-chained audit entry
`audit_verify_100`	Verify a 100-entry Merkle chain
`guard_overhead`	Resource guard overhead on an allowed call
`rate_limiter`	Rate limiter acquire (non-contended sliding window)
`secret_detection`	Scan text for AWS keys, GitHub tokens, bearer tokens

Scalability Benchmarks

Benchmark	Description
`policy_allowlist_100`	Policy eval with 100 allowed tools
`policy_allowlist_1000`	Policy eval with 1 000 allowed tools
`policy_allowlist_10000`	Policy eval with 10 000 allowed tools

End-to-End Benchmarks

Benchmark	Description
`enforcer_e2e`	Full enforcement pipeline (policy + audit + guard)
`enforcer_e2e_with_pii`	Full pipeline including PII redaction

Reference Results

Measured on Apple Silicon (arm64), Python 3.14.2, macOS. Your numbers will differ — run the suite locally for your hardware.

Benchmark	Iterations	Mean (ms)	P50 (ms)	P95 (ms)	P99 (ms)	P99.9 (ms)	StdDev (ms)
policy_pre_call	1,000	0.0927	0.0118	0.0196	0.2275	69.3257	2.1955
policy_post_call	1,000	0.0002	0.0002	0.0003	0.0003	0.0005	0.0000
pii_redaction	1,000	0.0347	0.0279	0.0357	0.2750	0.7607	0.0498
pii_clean_text	1,000	0.0288	0.0282	0.0324	0.0405	0.0858	0.0034
pii_long_text	1,000	0.1338	0.1286	0.1563	0.2204	0.3584	0.0182
audit_record	1,000	0.0786	0.0677	0.1267	0.2324	1.5794	0.0565
audit_verify_100	100	1.1387	1.1136	1.3251	1.4568	1.4568	0.0815
guard_overhead	1,000	0.0002	0.0002	0.0003	0.0003	0.0032	0.0001
rate_limiter	1,000	0.0004	0.0003	0.0005	0.0016	0.0256	0.0010
secret_detection	1,000	0.0119	0.0117	0.0119	0.0166	0.0655	0.0025
policy_allowlist_100	1,000	0.0251	0.0194	0.0243	0.3412	0.9967	0.0593
policy_allowlist_1000	1,000	0.0534	0.0503	0.0624	0.1042	0.1875	0.0105
policy_allowlist_10000	1,000	0.4300	0.4081	0.5582	0.7181	1.9790	0.0858
enforcer_e2e	1,000	0.0951	0.0561	0.2566	0.8919	3.1739	0.1824
enforcer_e2e_with_pii	1,000	0.1258	0.0929	0.4322	0.8068	0.9326	0.1277

Total duration: ~1 400 ms

Key Observations

Policy evaluation is sub-millisecond even at P99 for typical allowlists (≤ 1 000 tools). At 10 000 tools P99 is still under 1 ms.
PII redaction is ~0.03 ms for short text and ~0.13 ms for 2 KB text at P50, dominated by regex scanning.
Guard overhead and rate limiter are effectively zero-cost at < 1 μs per call.
End-to-end enforcement (policy + audit + guard) is < 0.1 ms at P50, < 1 ms at P99. Adding PII redaction pushes P50 to ~0.09 ms.
Audit chain verification scales linearly — ~1.1 ms for 100 entries.

Reproduction

CLI

# Default: 1000 iterations, Markdown output
python -m benchmarks.run

# 5000 iterations, JSON output
python -m benchmarks.run --iterations 5000 --format json

# Both formats, written to disk
python -m benchmarks.run --format all --output results/

Python API

from enforcecore.eval.benchmarks import BenchmarkRunner

runner = BenchmarkRunner()
suite = runner.run_all(iterations=1000)

# Markdown report
print(suite.to_markdown())

# JSON export
with open("results.json", "w") as f:
    f.write(suite.to_json())

CI

Benchmarks run in CI on every push. The workflow uses --format json to produce machine-readable output for regression detection. See .github/workflows/ci.yml for configuration.

Output Formats

JSON

{
  "metadata": {
    "timestamp": "2026-02-21T15:56:09Z",
    "python_version": "3.14.2",
    "platform": "Darwin arm64",
    "cpu": "arm",
    "machine": "arm64",
    "enforcecore_version": "1.0.0",
    "total_duration_ms": 1372.34
  },
  "results": [
    {
      "name": "policy_pre_call",
      "iterations": 1000,
      "warmup_iterations": 100,
      "mean_ms": 0.0927,
      "p50_ms": 0.0118,
      "p95_ms": 0.0196,
      "p99_ms": 0.2275,
      "p999_ms": 69.3257,
      "std_dev_ms": 2.1955
    }
  ]
}

Markdown

The to_markdown() method renders a self-contained report section with environment metadata and a Markdown table, suitable for pasting into GitHub issues or documentation.