Rust Compilation Times: Findings and Optimizations
Status: Reference (historical analysis; current Cargo.toml profile knobs are the source of truth) Last updated: 2026-05-20 20:32 EDT
This document captures the compilation performance analysis that drove the
current dev/test profile knobs in the workspace root Cargo.toml. The
absolute measurements below were taken before the 2026-04-28 batchalign3
fold roughly tripled the third-party dependency surface; subsequent updates
are reflected in Cargo.toml comments, which are the source of truth.
Background: How Rust Compilation Works
Rust compilation has two key mechanisms for speed:
-
Incremental compilation: When you change one file and rebuild, the compiler remembers which “codegen units” within each crate were affected and only recompiles those. This is the primary speedup mechanism for local iterative development (edit-compile-test cycles).
-
Crate-level caching: Cargo tracks which crates have changed inputs (source files, dependencies, feature flags). Unchanged crates are skipped entirely. This helps when you edit a leaf crate and don’t need to rebuild unrelated crates.
Additionally, there are external tools:
-
sccache: A shared compilation cache that stores compiled artifacts by content hash. Designed for CI environments where builds start from a clean state. It works by wrapping
rustcand checking a cache before invoking the real compiler. -
Linker choice: The linker runs after all crates are compiled to produce the final binary. Faster linkers (like
lld) can shave seconds off link time for large binaries.
What We Found
Problem 1: sccache Was Disabling Incremental Compilation (Critical)
The global ~/.cargo/config.toml had:
[build]
rustc-wrapper = "/opt/homebrew/bin/sccache"
This caused two compounding problems:
-
sccache disables Rust incremental compilation entirely. When a
rustc-wrapperis set, Cargo cannot use incremental mode because the wrapper interposes between Cargo and rustc, breaking the incremental artifact protocol. -
sccache had near-zero cache benefit for this workspace. The sccache stats showed a 2.7% Rust cache hit rate. Out of 37 compilations, 36 were marked “non-cacheable” because rlib crates (library crates, which is what most workspace crates produce) cannot be cached by sccache.
The result: every cargo build after a one-line change was effectively a clean
rebuild of the entire dependency chain. A change to talkbank-model (near the
root of the crate graph) triggered a full recompile of 11+ downstream crates,
taking 60-90 seconds even for a trivial edit.
Problem 2: Full Debug Info Was Inflating Link Times
The dev profile was generating full DWARF debug info (level 2), which includes:
- Type definitions for every struct/enum
- Variable location info for debugger inspection
- Full scope and lifetime metadata
This produces large .dSYM bundles and .o files, increasing linker input size
and slowing down the link phase.
Problem 3: Third-Party Dependencies at -O0
All third-party crates (serde, regex, tree-sitter, etc.) were compiled at
opt-level = 0 in dev builds. Since these crates rarely change, this was a
pure penalty: slow runtime (tests using serde deserialization, tree-sitter
parsing, or regex matching ran ~10x slower than necessary) with no compile-time
benefit after the first build.
Non-Problem: lld Linker
The linker = "lld" setting in the global cargo config was fine. On macOS this
uses ld64.lld from Homebrew’s LLVM toolchain (LLD 21.1.8), which is slightly
faster than Apple’s default linker for workspaces of this size. No change needed.
Changes Made
Change 1: Project-Local sccache Override
Created .cargo/config.toml in the project root:
[build]
rustc-wrapper = ""
This overrides the global sccache setting for this project only, re-enabling incremental compilation. Other Rust projects on the system are unaffected.
Why not modify the global config? Keeping the project-local override is safer, sccache may still be useful for other projects or CI workflows.
Note: .cargo/config.toml is gitignored (not committed) because the
empty-string rustc-wrapper = "" value trips a cargo-llvm-cov bug that
treats "" as a real wrapper path instead of “no wrapper.” Each
contributor opts in locally; CI does not carry the override.
Change 2: Reduced Debug Info
In the workspace Cargo.toml:
[profile.dev]
debug = "line-tables-only"
[profile.test]
debug = "line-tables-only"
This generates only file/line number information for backtraces, skipping the bulky type and variable metadata. You still get useful panic/backtrace output with source locations; you just can’t inspect local variables in a debugger (lldb/gdb). For most development workflows this is the right tradeoff.
Change 3: Optimized Third-Party Dependencies, RETIRED post-fold
The original change set [profile.dev.package."*"] opt-level = 1 to
optimize every third-party crate. After the 2026-04-28 batchalign3 fold
roughly tripled the third-party dependency surface (axum, async-trait,
tokio’s full feature set, etc.), the build-time cost of this setting
became prohibitive, and the workspace Cargo.toml comment block now
explains why it was removed.
[profile.test.package."*"] opt-level = 1 was also removed for the
same reason; for specific tests where runtime is the bottleneck, opt
in locally rather than reintroducing the workspace-wide setting.
Results (pre-fold, 2026-03 measurement)
The numbers below were captured pre-fold against the original ten-crate
workspace. The fold roughly tripled the third-party dep set and forced
retiring [profile.dev.package."*"] opt-level = 1; today’s wall-clock
will be slower and depends on which crate you touched. Re-run
cargo build --timings on the current workspace if you need fresh
numbers.
| Scenario | Before | After (pre-fold) |
|---|---|---|
| Clean build | ~3-5 min (est.) | ~39s |
Incremental rebuild (touch talkbank-model) | ~60-90s | ~4s |
| Test runtime (serde/regex/tree-sitter hot paths) | Slow (-O0) | Faster (-O1, when opt-in) |
Optional: Cranelift Backend for Maximum Iteration Speed
For the fastest possible “does it compile?” checks during rapid iteration, Rust nightly supports the Cranelift codegen backend:
cargo +nightly -Z codegen-backend=cranelift build
Cranelift generates code ~2x faster than LLVM but produces unoptimized output and is nightly-only. It is useful for compile-check cycles but not for correctness testing or benchmarking.
General Principles for Rust Compile Time
-
Incremental compilation is king for local dev. Anything that disables it (sccache, certain rustc-wrapper tools) is a net negative for iterative development.
-
sccache is for CI, not local dev. It shines when doing clean builds from scratch (CI runners, cross-compilation). For edit-rebuild cycles, incremental compilation is far more valuable.
-
Optimize dependencies, not your own crates.
[profile.dev.package."*"]withopt-level = 1gives you faster test execution with minimal compile cost (dependencies rarely change). -
Debug info has a real cost. Full DWARF debug info inflates binary sizes and link times. Use
line-tables-onlyunless you actively need a debugger. -
Measure before optimizing. Use
cargo build --timingsto generate an HTML report showing per-crate compile times and parallelism. Usesccache --show-statsto verify cache effectiveness. -
Watch for crate graph bottlenecks. Crates that sit at the root of the dependency graph (like
talkbank-model) are the critical path, changes to them trigger the longest rebuild chains. Keep these crates lean and consider splitting them if they grow too large.