CA Terminator Resolution
Status: Current Last updated: 2026-05-05 12:23 EDT
How CA markers are split between separators and linkers in the parser/model.
Current rule
The parser/model no longer promotes CA markers into utterance terminators.
The supported split is:
- Standard utterance terminators remain the CHAT terminators such as
.?!+...+/.and related final punctuation tokens. - CA intonation arrows (
⇗ ↗ → ↘ ⇘) staySeparatorcontent items. - CA TCU markers (
≈ ≋) staySeparatorcontent items. - CA TCU linker forms (
+≈ +≋) stayLinkeritems.
This means a trailing →, ≈, or ≋ remains in main-tier content rather
than being retyped as Terminator.
Parser/model consequences
- Tree-sitter grammar keeps arrows and
≈/≋on theseparatorpath. - The tree parser converts those nodes directly into
Separatorvariants. - The re2c parser classifies
≈/≋as separators and+≈/+≋as linkers. - The old post-hoc
resolve_ca_terminator()promotion pass was removed. Terminator::try_from_chat_str()intentionally rejects CA arrows,≈,≋,+≈, and+≋.
Data Model
The active surface split is:
| Kind | CHAT tokens |
|---|---|
Terminator | . ? ! +... +/. +//. +/? +!? +"/. +". +//? +..? +. |
Separator | ⇗ ↗ → ↘ ⇘ ≈ ≋ plus the other CA/content separators |
Linker | +≈ +≋ plus the other utterance linkers |
Legacy CA-only Terminator variants still exist in the type for backward
compatibility with older serialized data, but new parser/classifier code does
not construct them from CHAT text.
Regression coverage
The regression surface for this split is:
ca_symbols_are_not_chat_terminatorsintalkbank-modeltrailing_ca_arrow_stays_separatorintalkbank-parsertrailing_ca_no_break_stays_separatorintalkbank-parsertrailing_ca_technical_break_stays_separatorintalkbank-parser