Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Validation Cache

Status: Current Last modified: 2026-06-22 06:48 EDT

The CHAT-core validation cache, used by chatter validate and the LSP server. Distinct from the audio-task cache used by upstream batchalign3 for FA / UTR ASR / media conversion (documented separately in that project): this cache stores parse + validate results keyed by file path + options.

crates/talkbank-cache/.

Architecture

flowchart TD
    req["Validation request\n(path + options)"]
    key["Cache key\n(path_hash + RulesVersion + check_alignment + parser_kind)"]
    db["SQLite WAL\n~/.cache/talkbank-chat/\ntalkbank-cache.db"]
    hit["Cache hit\n→ return stored result"]
    miss["Cache miss\n→ parse + validate + store"]

    req --> key --> db
    db -->|"found + RulesVersion match + content_hash match"| hit
    db -->|"not found, rules changed, or content edited"| miss
    miss --> db

Configuration

ConfigValueWhy
BackendSQLite via sqlxConcurrent reads (WAL), atomic writes, zero-config
Pool size16 connectionsMatches validation worker count
mmap256 MBFast random access for 95k+ entries
InvalidationRules-version field + content hash + 30-day TTLRule-set or schema changes auto-invalidate; content edits invalidate per-file; stale entries pruned
BridgeEmbedded single-threaded tokio runtimeSync workers call rt.block_on() for async SQLite

Schema

file_cache table (see crates/talkbank-cache/migrations/20260101000000_initial.sql):

ColumnRole
path_hashBLAKE3 hash of the resolved path (part of the lookup key)
file_pathResolved file path, indexed for path-based maintenance ops
content_hashHash of the file content; mismatch invalidates the entry
versionCache-compatibility version (RulesVersion): the cache crate version folded together with a fingerprint of the active validation rule set. A mismatch invalidates the entry
cached_atInsertion timestamp
check_alignmentWhether alignment validation was requested
is_validCached validation outcome (0/1)
roundtrip_testedWhether roundtrip equivalence was checked
roundtrip_passedRoundtrip result when tested
parser_kindParser backend (tree-sitter or re2c)

The lookup key is the compound unique index (path_hash, version, check_alignment, parser_kind); file_path is a secondary index used by maintenance operations (orphan pruning, etc.).

Database location

PlatformPath
macOS~/Library/Caches/talkbank-chat/talkbank-cache.db
Linux~/.cache/talkbank-chat/talkbank-cache.db
Windows%LocalAppData%\talkbank-chat\talkbank-cache.db

Invalidation

  • Validation-rule changes: the version column holds a RulesVersion, which folds the talkbank-cache crate version together with a fingerprint of the active validation rule set (an FNV-1a hash over every ErrorCode the validator can emit, via talkbank_model::validation_rules_fingerprint). Adding, removing, or renaming a rule (for example introducing error code E370, “retrace marker must be followed by material”) changes the fingerprint, hence the RulesVersion, hence the lookup key, so verdicts cached under the old rule set become a cache MISS and are re-validated instead of served stale. This is the mechanism that keeps chatter validate (the authority on CHAT validity) from returning a stale “Valid” after the rules tighten. The stale rows stay on disk under their old version for selective re-testing; they are simply never served to a query carrying the new version.
  • Content changes: each entry stores the file’s content_hash; a mismatch is a per-file miss.
  • Time-based: entries older than 30 days are pruned.
  • Manual: pass --force to bypass cache lookups for a particular validation run.

Per repository policy, do not delete the cache directory without explicit request. Use --force when you want fresh validation for specific paths without destroying the whole cache.

See also

  • Upstream batchalign3 documents its own audio-task cache for FA / UTR ASR / media conversion.