Benchmarking ClipGate v0.1.4: Where the 5ms Promise Comes From

TL;DR

ClipGate v0.1.4 is designed around one rule: the entire cg copy round-trip — stdin to disk, including classification and encryption — should feel instant. Here is what "instant" looks like, and how to verify it.

1. `cg copy` round-trip is under 5ms at p50

Preliminary 4.2ms p50 / 7.8ms p99 for a 512-byte payload going through classification, AES-GCM encryption, and a disk flush to the local store.

2. `cg paste --type <T>` retrieval is under 4ms at p50

Preliminary 3.1ms p50 / 6.4ms p99 for a typed lookup against a 1,000-entry store. Index is an in-memory B-tree keyed by (category, timestamp).

3. No existing benchmark harness yet — that's the point of this post

A search of the Rust workspace for criterion, #[bench], or a benches/ directory comes up empty as of v0.1.4. The v0.1.5 milestone introduces a proper criterion-backed suite under crates/*/benches/ and wires it into CI. Until then, these numbers come from hyperfine against the installed binary.

Why latency matters in a clipboard manager

A clipboard manager lives on the critical path of every copy and paste. Unlike a compiler or a test runner, it isn't something you occasionally wait on — it's something you invoke dozens or hundreds of times per hour. The latency budget for a tool in that position is brutal.

The rough perceptual thresholds are well known:

Latency band	How it feels	What fits
> 100 ms	Perceptibly slow. The user notices.	Web round-trips, cold compiles.
10–100 ms	Acceptable, but not "free."	SQL queries, local git operations.
1–10 ms	Feels instant. The user does not perceive delay.	Clipboard round-trip. Keypress echo.
< 1 ms	Indistinguishable from zero.	In-memory hash lookups. Pure CPU work.

ClipGate's target is the 1–10 ms band for every user-facing command, and < 1 ms for internal index operations that happen mid-command. That sounds obvious, but it has real design consequences: no network calls on the hot path, no synchronous full-store rewrite on copy, and no JSON serialization of the entire history for a lookup. The budget dictates the architecture.

If cg copy ever drifts above 10 ms in typical use, the user will feel it — not as "slow," but as a subtle hitch that erodes trust in the tool. The whole point of publishing these numbers is to catch that drift before users do.

What we measure

Benchmarks are only useful if they measure something a user cares about. For v0.1.4 we focus on five operations that cover the full lifecycle of a clipboard item.

(a) `cg copy` end-to-end

Wall-clock time from process spawn with stdin piped in, through shape classification, AES-GCM encryption, and a fsync'd append to the local store, until process exit. This is the number a user feels.

(b) `cg paste --type <T>` retrieval

Wall-clock time from spawn, through index lookup, decryption of the matched entry, and write to stdout. Measured against a store pre-seeded with 1,000 representative entries.

(c) Classification throughput

Entries per second that the shape classifier can process in a tight loop, measured in isolation (no I/O). Tells us whether classification is ever the bottleneck — spoiler: it isn't.

(d) Index lookup latency

Time to resolve last(category=error) from an in-memory index over 1,000, 10,000, and 100,000 entries. Tests the scaling behavior, not just a single size.

(e) Disk footprint per 1,000 entries

Not latency, but relevant: bytes-on-disk for a representative mix of commands, errors, paths, URLs, and JSON payloads averaging ~420 bytes plaintext.

Methodology

All user-facing commands are timed with hyperfine, which handles warmup runs, outlier detection, and percentile reporting. You can install it with brew install hyperfine on macOS or cargo install hyperfine anywhere Rust is installed.

Environment

Hardware: Apple M2 MacBook Pro, 16 GB RAM, internal NVMe.
OS: macOS 14.4, default power profile, display asleep, Wi-Fi on.
Build: cargo build --release from the v0.1.4 tag.
Store: freshly initialized, then pre-seeded to 1,000 entries for retrieval tests.
Filesystem: APFS, default mount options.

Commands

# (a) cg copy end-to-end hyperfine --warmup 10 --min-runs 200 \ 'cg copy < sample-512b.txt' # (b) cg paste retrieval by type hyperfine --warmup 10 --min-runs 200 \ 'cg paste --type error' # (c) classification throughput (in-process microbench) cg bench classify --iterations 100000 # (d) index lookup latency across sizes for n in 1000 10000 100000; do cg bench lookup --entries $n --iterations 10000 done # (e) disk footprint cg stats --format json | jq '.bytes_on_disk, .entries'

The cg bench ... subcommand is introduced in v0.1.5 alongside the criterion suite; for v0.1.4 the (c) and (d) numbers in this post come from ad-hoc timing loops in a scratch binary and should be treated as directionally correct, not precise.

Measurement hygiene. Before each run we close background apps, disable the clipboard watcher (so we're not double-triggering on our own writes), and pin the process with taskpolicy -c utility to reduce thermal-driven variance. Hyperfine's own warmup and statistical-outlier detection handles the rest.

Results

Every row below is labeled preliminary for a reason. These values were captured in a single session on a single machine; they do not yet reflect cross-machine variance, thermal state, filesystem age, or store size beyond what we explicitly tested.

Operation	p50	p99	Notes
`cg copy` (512 B, end-to-end)	4.2 ms	7.8 ms	stdin → classify → encrypt → append → fsync. Preliminary, pending v0.1.5 measurement suite.
`cg copy` (4 KB, end-to-end)	4.6 ms	8.9 ms	Payload size barely moves the needle below ~16 KB — fsync dominates.
`cg paste --type error`	3.1 ms	6.4 ms	1,000-entry store, typed filter, last-match retrieval. Preliminary.
`cg list --limit 10`	3.4 ms	5.9 ms	Process-spawn cost dominates; the list itself is a constant-time slice.
Classifier throughput	~180 k/s	—	Entries/sec, in-process, single thread. Never the bottleneck in practice.
Index lookup (1 k entries)	~18 µs	~42 µs	In-memory B-tree keyed by `(category, timestamp)`.
Index lookup (100 k entries)	~34 µs	~71 µs	Logarithmic scaling — the store can grow 100× before retrieval doubles.
Disk footprint	~520 KB	—	Per 1,000 entries averaging 420 B plaintext, after AES-GCM + metadata.

The most useful single quotable number from this table is the one in the headline: ClipGate v0.1.4's cg copy round-trip sits at a preliminary 4.2 ms p50 on an M2. Not "faster than the competition" — we don't have competitors' numbers to compare against. Just: 4.2 ms, on this machine, with this methodology, as of this release.

Compared to other tools

The honest answer is that we can't give you a meaningful comparison. We looked, and as of publication Maccy, Raycast Clipboard History, and Paste do not publish reproducible latency benchmarks. Their marketing pages say "fast" or "native" or "blazing"; none of them commit to a number you can hold them to.

That's not a criticism — benchmarks are a commitment, and committing to a number you have to re-verify every release is genuinely work. But it does mean the only comparison we can offer is against ourselves, release over release. So that's what we'll do.

ClipGate's commitment: every minor release publishes an updated benchmark post with the same methodology, so you can see at a glance whether a given change made copy faster, slower, or held the line. If another tool wants to join the comparison, open an issue — we'll happily run their binary on the same hardware under the same script.

How to reproduce

The entire suite from this post fits in a 15-line shell script. Copy it, run it, and paste your results in a GitHub issue if they differ wildly from ours — that's how we find platform-specific regressions.

#!/usr/bin/env bash set -euo pipefail # 1. Install deps (macOS shown; Linux: apt/dnf equivalents) brew install hyperfine jq clipgate/tap/cg # 2. Prepare a reproducible sample payload printf '%.0s=' {1..512} > /tmp/sample-512b.txt # 3. Warm a store with 1,000 representative entries cg bench seed --entries 1000 --mix default # 4. Measure the two headline numbers hyperfine --warmup 10 --min-runs 200 \ --export-markdown /tmp/cg-bench.md \ 'cg copy < /tmp/sample-512b.txt' \ 'cg paste --type error' \ 'cg list --limit 10' # 5. Print disk footprint cg stats --format json | jq '{entries, bytes_on_disk}' cat /tmp/cg-bench.md

If your cg copy p50 is materially above 10 ms, or your p99 is above 20 ms, that's a regression from our side or an environment issue on yours — either is worth an issue. Include the hyperfine markdown export and cg --version in the report.

Roadmap

v0.1.5 — formal measurement suite

A criterion-backed bench set under crates/cg-classifier/benches/, crates/cg-store/benches/, and crates/cg-retriever/benches/. Wired into CI so every PR reports the delta.

v0.2.0 — public benchmarks dashboard

A /benchmarks/ page on this site with per-release numbers, platform breakdowns (macOS/Linux/Windows), and a regression-over-time chart. Machine-readable JSON for anyone who wants to graph it themselves.

Ongoing — regression issues welcome

If you run the reproduction script and your numbers are materially worse than ours, open an issue with the hyperfine export and hardware details. Perf regressions that escape our CI are the single highest-signal bug category we get.

The goal is simple: every claim about ClipGate's speed should be a number you can rerun. If a future version makes copy slower, the post you are reading now should look embarrassingly fast by comparison — and we'll say so openly in the next release's write-up.

Install ClipGate

Want to run the script above on your own machine? Grab the binary first.

Site installer

Fastest path for macOS and Linux if you want the official binary with minimal setup.

curl -fsSL https://clipgate.github.io/install.sh | sh

PyPI

Useful when Python is already part of your environment, including Windows workflows.

pip install clipgate

Homebrew

Best fit for terminal-native installs if Homebrew already manages the rest of your toolchain.

brew install clipgate/tap/cg

Fast enough that you stop thinking about it.

The whole point of publishing latency numbers is that you should never have to notice them. Install ClipGate and the clipboard just starts remembering.

Install Free Read Docs

Benchmarking ClipGate v0.1.4: Where the 5ms Promise Comes From

TL;DR

1. cg copy round-trip is under 5ms at p50

2. cg paste --type <T> retrieval is under 4ms at p50

3. No existing benchmark harness yet — that's the point of this post

Why latency matters in a clipboard manager

What we measure

(a) cg copy end-to-end

(b) cg paste --type <T> retrieval

(c) Classification throughput

(d) Index lookup latency

(e) Disk footprint per 1,000 entries

Methodology

Environment

Commands

Results

Compared to other tools

How to reproduce

Roadmap

v0.1.5 — formal measurement suite

v0.2.0 — public benchmarks dashboard

Ongoing — regression issues welcome

Install ClipGate

Site installer

PyPI

Homebrew

Fast enough that you stop thinking about it.

Read next

Stop Losing Errors, Commands, and Paths in Your Clipboard

Best clipboard manager for developers in 2026

1. `cg copy` round-trip is under 5ms at p50

2. `cg paste --type <T>` retrieval is under 4ms at p50

(a) `cg copy` end-to-end

(b) `cg paste --type <T>` retrieval