Validation Cache
Status: Current Last modified: 2026-06-22 06:48 EDT
The CHAT-core validation cache, used by chatter validate and the
LSP server. Distinct from the audio-task cache used by upstream
batchalign3 for FA / UTR ASR / media conversion (documented
separately in that project): this cache stores parse + validate
results keyed by file path + options.
crates/talkbank-cache/.
Architecture
flowchart TD
req["Validation request\n(path + options)"]
key["Cache key\n(path_hash + RulesVersion + check_alignment + parser_kind)"]
db["SQLite WAL\n~/.cache/talkbank-chat/\ntalkbank-cache.db"]
hit["Cache hit\n→ return stored result"]
miss["Cache miss\n→ parse + validate + store"]
req --> key --> db
db -->|"found + RulesVersion match + content_hash match"| hit
db -->|"not found, rules changed, or content edited"| miss
miss --> db
Configuration
| Config | Value | Why |
|---|---|---|
| Backend | SQLite via sqlx | Concurrent reads (WAL), atomic writes, zero-config |
| Pool size | 16 connections | Matches validation worker count |
mmap | 256 MB | Fast random access for 95k+ entries |
| Invalidation | Rules-version field + content hash + 30-day TTL | Rule-set or schema changes auto-invalidate; content edits invalidate per-file; stale entries pruned |
| Bridge | Embedded single-threaded tokio runtime | Sync workers call rt.block_on() for async SQLite |
Schema
file_cache table (see
crates/talkbank-cache/migrations/20260101000000_initial.sql):
| Column | Role |
|---|---|
path_hash | BLAKE3 hash of the resolved path (part of the lookup key) |
file_path | Resolved file path, indexed for path-based maintenance ops |
content_hash | Hash of the file content; mismatch invalidates the entry |
version | Cache-compatibility version (RulesVersion): the cache crate version folded together with a fingerprint of the active validation rule set. A mismatch invalidates the entry |
cached_at | Insertion timestamp |
check_alignment | Whether alignment validation was requested |
is_valid | Cached validation outcome (0/1) |
roundtrip_tested | Whether roundtrip equivalence was checked |
roundtrip_passed | Roundtrip result when tested |
parser_kind | Parser backend (tree-sitter or re2c) |
The lookup key is the compound unique index
(path_hash, version, check_alignment, parser_kind); file_path is a
secondary index used by maintenance operations (orphan pruning, etc.).
Database location
| Platform | Path |
|---|---|
| macOS | ~/Library/Caches/talkbank-chat/talkbank-cache.db |
| Linux | ~/.cache/talkbank-chat/talkbank-cache.db |
| Windows | %LocalAppData%\talkbank-chat\talkbank-cache.db |
Invalidation
- Validation-rule changes: the
versioncolumn holds aRulesVersion, which folds thetalkbank-cachecrate version together with a fingerprint of the active validation rule set (an FNV-1a hash over everyErrorCodethe validator can emit, viatalkbank_model::validation_rules_fingerprint). Adding, removing, or renaming a rule (for example introducing error code E370, “retrace marker must be followed by material”) changes the fingerprint, hence theRulesVersion, hence the lookup key, so verdicts cached under the old rule set become a cache MISS and are re-validated instead of served stale. This is the mechanism that keepschatter validate(the authority on CHAT validity) from returning a stale “Valid” after the rules tighten. The stale rows stay on disk under their old version for selective re-testing; they are simply never served to a query carrying the new version. - Content changes: each entry stores the file’s
content_hash; a mismatch is a per-file miss. - Time-based: entries older than 30 days are pruned.
- Manual: pass
--forceto bypass cache lookups for a particular validation run.
Per repository policy, do not delete the cache directory without explicit
request. Use --force when you want fresh validation for specific paths
without destroying the whole cache.
See also
- Upstream
batchalign3documents its own audio-task cache for FA / UTR ASR / media conversion.