Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Symbols

Status: Reference Last modified: 2026-05-29 18:43 EDT

CHAT uses a rich set of symbols for transcription conventions. This page documents the symbol categories and the symbol registry that drives both the grammar and the Rust crates. The symbol registry (spec/symbols/symbol_registry.json) is the source of truth, when this page and the registry disagree, the registry wins.

Symbol Registry

The authoritative symbol definitions live in spec/symbols/symbol_registry.json. This JSON file is the single source of truth, it generates:

  • Character sets for the tree-sitter grammar (grammar.js)
  • Rust constants for the model and validation crates
  • Validation rules for the spec tool

After any change to the symbol registry, run:

just symbols-gen

Symbol Categories

Terminators

Punctuation that ends an utterance:

SymbolNameUsage
.PeriodDeclarative
?QuestionInterrogative
!ExclamationExclamatory
+...Trailing offIncomplete utterance
+..?Trailing-off questionQuestion trails off
+/.InterruptionSpeaker interrupted by another
+//.Self-interruptionSpeaker interrupts self
+/?Interrupted questionQuestion interrupted
+!?Broken questionExclamation-question
+"/.Quoted new lineQuotation continues on next line

CA (Conversation Analysis) Symbols

CA notation symbols fall into three parser-distinct categories in spec/symbols/symbol_registry.json. They are not interchangeable, the grammar treats them as different node kinds.

CA element symbols (ca_element_symbols) attach to a word, so book↑ is a single token whose content carries the symbol:

SymbolMeaning
Rising pitch (attaches to a word)
Falling pitch (attaches to a word)
Micropause
Inhalation marker
Other CA element symbols

CA arrow separators (in word_segment_forbidden_start_symbols) are own-node separators between words, not word-attachments. The parser splits them as their own nodes:

SymbolMeaning
Level pitch contour
Rising-to-mid contour
Falling-to-mid contour
Rising-to-high contour
Falling-to-low contour
Other CA arrow separators

CA delimiter symbols (ca_delimiter_symbols) bracket annotated prosodic regions:

SymbolMeaning
°Quiet speech
Higher / lower pitch register
Other prosodic-region delimiters
Low / high prosodic-region delimiters
§ ΫAdditional registered CA delimiters

Confirm the current contents of each category by reading spec/symbols/symbol_registry.json directly, that is the file just symbols-gen derives the grammar and Rust constants from.

Word Segment Characters

Characters that are forbidden at the start of words, forbidden in the rest of words, or forbidden throughout. These define the lexical boundaries of what constitutes a “word” in CHAT.

The grammar uses these sets to construct the word-matching regex patterns. Characters like [, ], <, >, (, ) are structural delimiters and cannot appear inside words.

Event Segment Characters

Characters forbidden in event descriptions (&=event content). Events have slightly different lexical rules than words.

Language Codes

CHAT uses ISO 639-3 three-letter language codes in @Languages headers and @s: word markers:

@Languages:	eng, fra
*CHI:	I want a croissant@s:fra .

Common codes: eng (English), fra (French), deu (German), spa (Spanish), zho (Mandarin), jpn (Japanese).

Special Markers

@ Markers (Word-Level)

The authoritative form-marker set is FormType in crates/talkbank-model/src/model/content/word/form.rs. Current variants:

MarkerMeaning
@aApproximate / phonologically consistent form
@bBabbling
@cChild-invented form
@dDialect form
@fFamily-specific form
@fpFilled pause (deprecated, use &-um etc.)
@gGemination / general special form
@iInterjection
@kLetter sequence (kinship)
@lSingle letter
@lsLetter plural
@nNeologism
@oOnomatopoeia
@pProper name
@qMetalinguistic reference
@sasSecond-attempt success
@siSinging
@slSlang
@tTest word
@uUnibet transcription
@wpWord play
@xComplex / excluded
@z:<label>User-defined special form (carries an arbitrary label)

The second-language qualifier @s:LANG is a separate construct (see the L2 morphotag section of the Batchalign book); it is not part of FormType.

& Markers (Events and Fillers)

PrefixMeaning
&=Paralinguistic event (e.g., &=laughs)
&-Filler (e.g., &-um)
&+Phonological fragment (e.g., &+sh)
&~Nonword (e.g., &~mama)
&*Other speaker’s speech event (e.g., &*MOT:word, speech attributed to another speaker)

Scope Markers

MarkerMeaning
[/]Partial retrace, speaker repeats the same words
[//]Full retrace, speaker restarts with different words
[///]Multiple retracing, multiple false starts
[/-]Reformulation, speaker rephrases with different structure
[*]Error
[?]Best guess
[>]Overlap follows
[<]Overlap precedes
[= text]Explanation
[: text]Replacement