Rust HFT Performance Lab — Strategy, Tooling & GCP Integration

Beyond `cargo bench`

In High Frequency Trading, mean response time is irrelevant. You live and die by the 99.9th percentile (tail latency) and determinism. Standard tools like criterion.rs are excellent for logic, but HFT requires a specialized stack focused on instruction counting, cache locality, and jitter analysis.

Determinism > Speed

Wall-clock time varies with OS noise. Use instruction counters (Iai) to regress-test performance reliably in CI.

Measure the Tail

Averages hide spikes. Use HdrHistogram to capture the full distribution. One slow packet can lose the trade.

Hardware Awareness

Understand L3 cache misses and branch prediction failures. Use perf and causal profiling.

The HFT Rust Stack Recommendation

UPDATED 2026

Category	Legacy/Standard	HFT Recommendation	Why?
Micro-benchmarks	Criterion.rs	Divan	Simpler API, lower overhead, pinned-cpu friendly.
CI Regression	Criterion (Noise sensitive)	Iai-Callgrind	Uses Valgrind. Counts instructions, not time. 100% deterministic.
Live Metrics	Averages / Prometheus	HdrHistogram	Zero-allocation recording of p99, p99.9, p99.99 latency.
Profiling	Flamegraph	Samply / Hotspot	Better visualization of time-lines and hardware events.

Rust vs Go for HFT on GCP

A strategic analysis of why Rust is the superior choice for a GCP-hosted trading exchange with existing Rust DNA — and where Go still fits.

⚡

Executive Verdict

For a <10-person team building a trading exchange on GCP with an existing Rust codebase, the "Polyglot Tax" of maintaining both Go and Rust outweighs Go's velocity advantage. The maturation of Google's official Rust SDKs (140+ APIs) and the Alloy library for Ethereum removes the final barriers to an all-Rust architecture.

Memory Safety Zero GC Pauses rust_decimal (128-bit) Alloy (Type-Safe EVM) 140+ GCP APIs

⏱ Latency Determinism

Go's GC creates a dangerous feedback loop: during high-volatility events (when the exchange must be most responsive), allocation rates spike, forcing the GC to run more frequently.

Go GC STW pause100–200 µs

Go GC CPU overheadUp to 25%

Rust p99 ≈ p50?Yes (flat profile)

🧮 Financial Precision

float64 is forbidden in core ledger code. The choice of decimal library directly impacts GC pressure.

Go shopspring/decimalHeap (big.Int)

Rust rust_decimalStack (128-bit)

Precision28 decimal places

💰 Cloud Run Economics

Rust binaries are lean native code — no runtime, no GC, instant cold starts. This directly maps to lower GCP bills.

Memory savings50–70% less

Cold startNear-instant

GCP pricing unitGB-seconds

🔗 Blockchain: Alloy > Geth

Alloy (by Paradigm) replaced ethers-rs as the Rust Ethereum standard. The sol! macro generates type-safe bindings at compile time.

Memory vs GethSignificantly less

Serde zero-copy2–5× faster indexing

ABI safetyCompile-time

GCP SDK Maturity: Go vs Rust (2026)

COMPARATIVE

Capability	Go	Rust
API Coverage	300+ APIs (decade)	140+ APIs (official)
Auth (ADC)	Mature	google-cloud-auth ✓
Secret Manager	First-class	Official crate
Cloud Logging	Auto-instrument	JSON stdout → auto-parse
Cloud Trace	Auto-instrument	tracing-opentelemetry
Async runtime	Goroutines	Tokio (async/await)

🛡 Compliance & Security Edge

Memory Safety

70% of CVEs are memory safety bugs (Microsoft). Rust eliminates this class at compile time — reducing audit scope for SOC2/ISO 27001.

Data Race Freedom

Send + Sync traits prevent data races at compile time. Go only detects them at runtime via -race flag.

Key Hygiene

The secrecy crate guarantees key material is zeroed on drop. Go's GC cannot guarantee when memory is reclaimed.

🚪 The "Escape Hatch": When Go is Still Valid

If Go must be introduced, limit it to isolated, stateless, non-financial tools:

Ops CLI tools
Read-only dashboard backends (BigQuery readers)
Log cleanup scripts

Rule: Go code must never handle the Order struct or private key material.

Ollo GCP SDK Audit Feb 2026

A live audit of how Ollo's Rust services actually use (or don't use) the official Google Cloud Rust SDKs — and the low-hanging fruit for better observability and lower bills.

~5%

Google SDK Adoption

1 / 6

Services with any Google crate

Services with OpenTelemetry

Per-Service SDK Matrix

LIVE AUDIT

Service	Google Crates	Logging Stack	Findings
gateway	google-cloud-auth, google-cloud-logging-v2 ⚠️	tracing (JSON)	Logging crate is a phantom dep (never imported). Secret Manager via raw `reqwest`.
broadcast	google-cloud-auth (commented out)	tracing	No active Google SDK usage.
oracle	None ❌	log + env_logger	Legacy logging. Unstructured text output.
quote	None ❌	log + env_logger	Legacy logging. Same issue as oracle.
float	None ❌	tracing	No Google crates. No JSON output.
liquidation	None ❌	tracing	No Google crates. No JSON output.

Low-Hanging Fruit

Structured JSON Logging Fleet-Wide

Free

Ensure all tracing-subscriber setups use .json() formatter. Cloud Logging auto-parses structured JSON into queryable, severity-aware fields.

Affects: broadcast, float, liquidation • Effort: ~5 min each

Migrate oracle & quote to `tracing`

Free

Replace log + env_logger with tracing + tracing-subscriber (JSON). Unstructured text logs lose filterability and cost more per GiB ingested.

Affects: oracle, quote • Effort: ~15 lines per service

Add OpenTelemetry for Cloud Trace

High Impact

Add tracing-opentelemetry layer to existing subscriber stacks. Gives distributed request traces across gateway → broadcast → on-chain. Cloud Trace free tier: 2.5M spans/month.

Affects: all services • Effort: ~30 min per service

Remove phantom `google-cloud-logging-v2`

Hygiene

Listed in gateway's Cargo.toml but never imported. Adds build time and binary size for nothing.

Affects: gateway • Effort: delete 1 line

Use official Secret Manager SDK

Cleanup

Gateway manually builds OAuth tokens, constructs REST URLs, and base64-decodes responses (~60 lines). The google-cloud-secretmanager crate handles this in ~5 lines with built-in retries.

Affects: gateway • Effort: ~30 min

Log-Level Tuning for Cost Reduction

Savings

Tighten EnvFilter defaults: info for your crates, warn for deps. Cloud Logging charges $0.50/GiB after the free 50 GiB/month.

Affects: all services • Effort: env var change

Projected Impact

Current State

❌ No distributed tracing
❌ 2 services emit unstructured logs
❌ 60-line hand-rolled Secret Manager client
❌ Phantom dependency compiled into gateway
❌ No severity filtering in Cloud Logging

After Low-Hanging Fruit

✅ Full distributed tracing (Cloud Trace free tier)
✅ 6/6 services emit structured JSON
✅ Official SDK with retries & error handling
✅ Clean dependency tree
✅ Queryable, severity-aware logs → lower ingestion cost

The Toolbox

Select a tool to understand its specific application in High Frequency Trading.

Market Conditions

Simulate how "System Noise" (GC pauses, OS context switches, network jitter) destroys your tail latency while leaving the average mostly untouched.

System Noise / Jitter

Clean (Isolated Core) Noisy (Shared Cloud)

Live Metrics (µs)

Average: --

Median (P50): --

P99 (Tail): --

Insight: Notice how the Average barely moves as noise increases, but P99 skyrockets. This is why cargo bench (which focuses on means) can be misleading for HFT.

Latency Distribution (Log Scale)

Latency Samples P99 Threshold

Simulating 1,000 requests. Y-axis is Logarithmic to show outliers.

Beyond cargo bench