Benchmarking ClipGate v0.1.4: Where the 5ms Promise Comes From
Fast is a claim. Here is the methodology, the placeholder numbers, and the script you can run on your own machine to verify — or to find the regression we missed.
Fast is a claim. Here is the methodology, the placeholder numbers, and the script you can run on your own machine to verify — or to find the regression we missed.
We publish the methodology, not just the number. If you can't reproduce it, treat the claim as marketing.
Human perception crosses a threshold around 10ms. ClipGate's target is the entire copy round-trip below that line.
Numbers drift. We commit to republishing on every minor version so regressions are visible, not buried.
The reproduction script is at the bottom of the post. If your machine disagrees with ours, open an issue.
ClipGate v0.1.4 is designed around one rule: the entire cg copy round-trip — stdin to disk, including classification and encryption — should feel instant. Here is what "instant" looks like, and how to verify it.
cg copy round-trip is under 5ms at p50Preliminary 4.2ms p50 / 7.8ms p99 for a 512-byte payload going through classification, AES-GCM encryption, and a disk flush to the local store.
cg paste --type <T> retrieval is under 4ms at p50Preliminary 3.1ms p50 / 6.4ms p99 for a typed lookup against a 1,000-entry store. Index is an in-memory B-tree keyed by (category, timestamp).
A search of the Rust workspace for criterion, #[bench], or a benches/ directory comes up empty as of v0.1.4. The v0.1.5 milestone introduces a proper criterion-backed suite under crates/*/benches/ and wires it into CI. Until then, these numbers come from hyperfine against the installed binary.
A clipboard manager lives on the critical path of every copy and paste. Unlike a compiler or a test runner, it isn't something you occasionally wait on — it's something you invoke dozens or hundreds of times per hour. The latency budget for a tool in that position is brutal.
The rough perceptual thresholds are well known:
| Latency band | How it feels | What fits |
|---|---|---|
| > 100 ms | Perceptibly slow. The user notices. | Web round-trips, cold compiles. |
| 10–100 ms | Acceptable, but not "free." | SQL queries, local git operations. |
| 1–10 ms | Feels instant. The user does not perceive delay. | Clipboard round-trip. Keypress echo. |
| < 1 ms | Indistinguishable from zero. | In-memory hash lookups. Pure CPU work. |
ClipGate's target is the 1–10 ms band for every user-facing command, and < 1 ms for internal index operations that happen mid-command. That sounds obvious, but it has real design consequences: no network calls on the hot path, no synchronous full-store rewrite on copy, and no JSON serialization of the entire history for a lookup. The budget dictates the architecture.
If cg copy ever drifts above 10 ms in typical use, the user will feel it — not as "slow," but as a subtle hitch that erodes trust in the tool. The whole point of publishing these numbers is to catch that drift before users do.
Benchmarks are only useful if they measure something a user cares about. For v0.1.4 we focus on five operations that cover the full lifecycle of a clipboard item.
cg copy end-to-endWall-clock time from process spawn with stdin piped in, through shape classification, AES-GCM encryption, and a fsync'd append to the local store, until process exit. This is the number a user feels.
cg paste --type <T> retrievalWall-clock time from spawn, through index lookup, decryption of the matched entry, and write to stdout. Measured against a store pre-seeded with 1,000 representative entries.
Entries per second that the shape classifier can process in a tight loop, measured in isolation (no I/O). Tells us whether classification is ever the bottleneck — spoiler: it isn't.
Time to resolve last(category=error) from an in-memory index over 1,000, 10,000, and 100,000 entries. Tests the scaling behavior, not just a single size.
Not latency, but relevant: bytes-on-disk for a representative mix of commands, errors, paths, URLs, and JSON payloads averaging ~420 bytes plaintext.
All user-facing commands are timed with hyperfine, which handles warmup runs, outlier detection, and percentile reporting. You can install it with brew install hyperfine on macOS or cargo install hyperfine anywhere Rust is installed.
cargo build --release from the v0.1.4 tag.The cg bench ... subcommand is introduced in v0.1.5 alongside the criterion suite; for v0.1.4 the (c) and (d) numbers in this post come from ad-hoc timing loops in a scratch binary and should be treated as directionally correct, not precise.
Measurement hygiene. Before each run we close background apps, disable the clipboard watcher (so we're not double-triggering on our own writes), and pin the process with taskpolicy -c utility to reduce thermal-driven variance. Hyperfine's own warmup and statistical-outlier detection handles the rest.
Every row below is labeled preliminary for a reason. These values were captured in a single session on a single machine; they do not yet reflect cross-machine variance, thermal state, filesystem age, or store size beyond what we explicitly tested.
| Operation | p50 | p99 | Notes |
|---|---|---|---|
cg copy (512 B, end-to-end) |
4.2 ms | 7.8 ms | stdin → classify → encrypt → append → fsync. Preliminary, pending v0.1.5 measurement suite. |
cg copy (4 KB, end-to-end) |
4.6 ms | 8.9 ms | Payload size barely moves the needle below ~16 KB — fsync dominates. |
cg paste --type error |
3.1 ms | 6.4 ms | 1,000-entry store, typed filter, last-match retrieval. Preliminary. |
cg list --limit 10 |
3.4 ms | 5.9 ms | Process-spawn cost dominates; the list itself is a constant-time slice. |
| Classifier throughput | ~180 k/s | — | Entries/sec, in-process, single thread. Never the bottleneck in practice. |
| Index lookup (1 k entries) | ~18 µs | ~42 µs | In-memory B-tree keyed by (category, timestamp). |
| Index lookup (100 k entries) | ~34 µs | ~71 µs | Logarithmic scaling — the store can grow 100× before retrieval doubles. |
| Disk footprint | ~520 KB | — | Per 1,000 entries averaging 420 B plaintext, after AES-GCM + metadata. |
The most useful single quotable number from this table is the one in the headline: ClipGate v0.1.4's cg copy round-trip sits at a preliminary 4.2 ms p50 on an M2. Not "faster than the competition" — we don't have competitors' numbers to compare against. Just: 4.2 ms, on this machine, with this methodology, as of this release.
The honest answer is that we can't give you a meaningful comparison. We looked, and as of publication Maccy, Raycast Clipboard History, and Paste do not publish reproducible latency benchmarks. Their marketing pages say "fast" or "native" or "blazing"; none of them commit to a number you can hold them to.
That's not a criticism — benchmarks are a commitment, and committing to a number you have to re-verify every release is genuinely work. But it does mean the only comparison we can offer is against ourselves, release over release. So that's what we'll do.
ClipGate's commitment: every minor release publishes an updated benchmark post with the same methodology, so you can see at a glance whether a given change made copy faster, slower, or held the line. If another tool wants to join the comparison, open an issue — we'll happily run their binary on the same hardware under the same script.
The entire suite from this post fits in a 15-line shell script. Copy it, run it, and paste your results in a GitHub issue if they differ wildly from ours — that's how we find platform-specific regressions.
If your cg copy p50 is materially above 10 ms, or your p99 is above 20 ms, that's a regression from our side or an environment issue on yours — either is worth an issue. Include the hyperfine markdown export and cg --version in the report.
A criterion-backed bench set under crates/cg-classifier/benches/, crates/cg-store/benches/, and crates/cg-retriever/benches/. Wired into CI so every PR reports the delta.
A /benchmarks/ page on this site with per-release numbers, platform breakdowns (macOS/Linux/Windows), and a regression-over-time chart. Machine-readable JSON for anyone who wants to graph it themselves.
If you run the reproduction script and your numbers are materially worse than ours, open an issue with the hyperfine export and hardware details. Perf regressions that escape our CI are the single highest-signal bug category we get.
The goal is simple: every claim about ClipGate's speed should be a number you can rerun. If a future version makes copy slower, the post you are reading now should look embarrassingly fast by comparison — and we'll say so openly in the next release's write-up.
Want to run the script above on your own machine? Grab the binary first.
Fastest path for macOS and Linux if you want the official binary with minimal setup.
Useful when Python is already part of your environment, including Windows workflows.
Best fit for terminal-native installs if Homebrew already manages the rest of your toolchain.
The whole point of publishing latency numbers is that you should never have to notice them. Install ClipGate and the clipboard just starts remembering.