Testing
Status: Current Last modified: 2026-06-15 15:00 EDT
Test Generation Pipeline
Specs are the source of truth. All grammar corpus tests, Rust parser tests,
and error docs are generated from specs. This repo does not currently
have the old monorepo-wide make test-gen wrapper; run the relevant
spec/tools binaries directly instead, and never hand-edit generated files.
flowchart LR
subgraph sources["Source of Truth"]
constructs["spec/constructs/\n(construct specs, see directory listing)"]
errors["spec/errors/\n(error specs, see directory listing)"]
templates["spec/tools/templates/\n(Tera wrappers)"]
end
subgraph generators["spec/tools generators\n(run only what changed)"]
gen_ts["gen_tree_sitter_tests"]
gen_rust["gen_rust_tests"]
gen_validation["gen_validation_corpus"]
gen_docs["gen_error_docs"]
end
subgraph outputs["Generated Outputs (DO NOT EDIT)"]
ts_tests["grammar/test/corpus/\n(tree-sitter tests)"]
rust_tests["crates/talkbank-parser-tests/tests/generated/\n(Rust tests)"]
val_corpus["crates/talkbank-parser-tests/tests/error_corpus/validation_errors/\n(.cha fixtures + manifest.json)"]
error_docs["docs/errors/\n(local generated error pages)"]
end
constructs & errors --> gen_ts
templates --> gen_ts
constructs & errors --> gen_rust
errors --> gen_validation
errors --> gen_docs
gen_ts --> ts_tests
gen_rust --> rust_tests
gen_validation --> val_corpus
gen_docs --> error_docs
To add a grammar test or error test, add a spec file in spec/constructs/
or spec/errors/, then run the current generator command(s) from
Spec Workflow. Use only the binaries that match the
artifacts you intentionally changed.
Test Strategy
Testing is organized in layers, from fastest to most comprehensive.
flowchart TD
unit["Unit + Integration Tests\n(cargo nextest run)"]
specgen["Spec-Generated Tests\n(spec/tools generators)\nParser + validation layer"]
grammar["Grammar Corpus\n(tree-sitter test)"]
ref["Reference Corpus\n(corpus/reference/, 100% required)"]
gates["Local verification sweep + CI\n(dev-checks.md / quality-gates.md)"]
unit --> specgen --> grammar --> ref --> gates
Never-Regress Gates
Four gates form the regression contract for the CHAT core. They guard the
behavior a successor cannot easily re-derive: parser correctness, lossless
serialization, full-corpus coverage, and error detection. Any commit
touching the relevant surface (grammar, parser, model, validation,
serialization, or alignment) MUST run the matching gate(s) and keep them
green. A red gate is a bug until proven otherwise (see the repo CLAUDE.md,
“Test Failures Are Bugs Until Proven Otherwise”), never a test expectation
to quietly update.
| Gate | Command | What it protects |
|---|---|---|
| Parser equivalence | cargo nextest run -p talkbank-parser-tests -E 'test(parser_equivalence)' | The re2c oracle parser and the tree-sitter parser agree on every reference file. A divergence means one parser is wrong, or a construct spec is missing. |
| Roundtrip idempotency | cargo nextest run -p talkbank-parser-tests --test roundtrip_reference_corpus | parse, serialize, re-parse yields a semantically identical AST (SemanticEq) for every reference file. Catches any model or WriteChat change that silently loses information. |
| Reference corpus 100% | (the same roundtrip_reference_corpus test) | Every file under corpus/reference/ parses and roundtrips with zero failures. The reference corpus is the ultimate arbiter of full-file correctness; it must be 100%, never “mostly”. |
| Error-code spec tests | cargo nextest run -p talkbank-parser-tests --test generated_tests --test validation_error_corpus --test error_coverage | Every error spec under spec/errors/ still fires its expected code: parser-layer errors reject as designed, validation-layer errors are detected, and every ErrorCode has a backing spec. These tests are generated from specs, never hand-written. |
Two of the four share one test: roundtrip_reference_corpus enforces both
roundtrip idempotency and the reference-corpus 100% guarantee, because it
iterates every reference file (the coverage guarantee) and checks roundtrip
semantic equality on each (the idempotency guarantee).
All four also run as part of the full workspace sweep
(cargo nextest run --workspace), so a complete local run before committing
covers them. The per-gate commands above are the fast, targeted way to
re-check one surface during the inner development loop. The sections below
describe each layer in more detail.
Unit Tests (nextest)
cargo nextest run
Runs all unit and integration tests across all crates (~2300+ tests). These test individual functions, serialization roundtrips, and model invariants.
cargo nextest does not run doctests. Keep cargo test --doc as a separate
verification step when you change public API examples or doc comments.
Parser Equivalence
cargo nextest run -p talkbank-parser-tests -E 'test(parser_equivalence)'
Runs the parser on each file in the corpus/reference/ tree and validates
results. Each .cha file is its own test, enabling per-file parallelism and
failure isolation via nextest. The exact file count is whatever
find corpus/reference -name '*.cha' -type f | wc -l reports, do not
hard-code it here.
Spec-Generated Tests
Part of talkbank-parser-tests. These are generated from specs via the
current spec/tools binaries and currently test:
- Construct specs: input parses correctly
- Parser-layer error specs: input fails to parse with expected error code
- Validation-layer error specs: input parses but validation reports expected error code
Common entrypoints from the repository root:
cargo run --manifest-path spec/tools/Cargo.toml --bin gen_tree_sitter_tests -- \
--output-dir grammar/test/corpus \
--template-dir spec/tools/templates
cargo run --manifest-path spec/tools/Cargo.toml --bin gen_rust_tests -- \
--output-dir crates/talkbank-parser-tests/tests/generated
cargo run --manifest-path spec/tools/Cargo.toml --bin gen_validation_corpus -- \
--corpus-dir crates/talkbank-parser-tests/tests/error_corpus/validation_errors
Tree-Sitter Grammar Tests
cd grammar && tree-sitter test
Runs the tree-sitter grammar corpus tests. This is the right gate for grammar structure changes.
Error Corpus Tests
Error fixtures live in spec/errors/. Parser-layer error examples become Rust
tests via gen_rust_tests; validation-layer examples become a .cha fixture
corpus + manifest.json via gen_validation_corpus, under
crates/talkbank-parser-tests/tests/error_corpus/validation_errors/, which the
data-driven runner validation_error_corpus.rs consumes. Add a new error spec
under spec/errors/E###_*.md and regenerate.
Tree-Sitter Tests
cd grammar
tree-sitter test
Verifies the grammar produces correct CSTs for known inputs. The
actual test count comes from ls grammar/test/corpus/*.txt | wc -l;
do not hard-code it.
Reference Corpus
The reference corpus at corpus/reference/ is organized into subdirs
(annotation/, audio/, ca/, content/, core/, edge-cases/,
languages/, tiers/, word-features/). The parser must handle
every file at 100%, the exact file count is whatever
find corpus/reference -name '*.cha' -type f | wc -l reports.
This corpus is the ultimate arbiter of correctness for full-file parsing.
Local Verification Contract
There is no repo-local make verify wrapper in this checkout today.
Use the explicit command set from
Developer Verification Checks and
Testing and Quality Gates instead.
Core local sweep:
cargo fmt --all -- --check
cargo build --workspace --all-targets --locked
cargo nextest run --workspace
cargo test --doc
Then add the surface-specific checks that match your change:
- grammar changes:
cd grammar && tree-sitter generate && tree-sitter test - spec-tool changes:
cargo build --manifest-path spec/tools/Cargo.tomlandcargo build --manifest-path spec/runtime-tools/Cargo.toml - parser / model / alignment / serialization changes:
cargo nextest run -p talkbank-parser-tests -E 'test(parser_equivalence)'andcargo nextest run -p talkbank-parser-tests --test roundtrip_reference_corpus
Running Specific Tests
# Single test by name
cargo nextest run test_name
# Tests in a specific crate
cargo nextest run -p talkbank-model
# Tests matching a pattern
cargo nextest run -- mor
# With output
cargo nextest run --no-capture
What to Run When
| What you changed | Run |
|---|---|
Grammar (grammar.js) | cd grammar && tree-sitter generate && tree-sitter test, then the relevant parser/spec-generator commands |
| Parser (CST-to-model) | cargo nextest run -p talkbank-parser |
| Model (types, validation, alignment) | cargo nextest run -p talkbank-model |
| CLI (chatter args, dispatch) | cargo nextest run -p chatter |
| LSP | cargo nextest run -p talkbank-lsp |
| Spec files | Run the relevant gen_* commands from spec/tools, then the local verification sweep from dev-checks.md |
| Pre-merge (any change) | The local verification sweep from dev-checks.md plus surface-specific additions |
| Pre-push (quick) | Re-run the narrowest commands that cover the surfaces you touched; there is no repo-local make ci-local wrapper |
Mutation Testing
Use cargo-mutants to find code that can be changed without any test failing, the true coverage gaps.
# Install (once)
cargo install cargo-mutants
# Run against a specific crate (--jobs 1 to avoid OOM on 64 GB machines)
cargo mutants -p talkbank-parser --timeout 120 --jobs 1
# Review results
cat mutants.out/missed.txt # Mutations no test caught
cat mutants.out/caught.txt # Mutations properly detected
Mutation testing is not part of CI but should be run periodically (after major changes) to find untested logic paths. Results guide where to add new tests.
Configuration: mutants.toml at the repo root excludes trivial functions.
Adding Tests
- Model tests: add to the relevant crate’s
tests/directory or#[cfg(test)]module - Parser tests: if the change is about grammar shape or validation contracts,
add or update specs and regenerate with the relevant
spec/toolsgenerator binaries - Error tests: add a new spec under
spec/errors/E###_*.mdand rungen_rust_tests(parser-layer) orgen_validation_corpus(validation-layer); the generated Rust tests / fixture corpus + manifest are produced automatically