Queryting by "CQL" (Corpus Query Language) lets us search for patterns in the selected transcripts. We construct a CQL query by specifying a search pattern of words, lemmas, and parts of speech.
The "cqlArr" parameter specifies a pattern to search for in text. The pattern is built up by appending components that are one of three types:
Exact word match (type="word").
Match any form of a word (type="lemma").
Or part of speech (type="pos").
Along with type, components have another value, "freq", specifying how many times an item should appear at that location.
Appear once at that location (freq="once").
Appear zero or more times at location (freq="zeroPlus").
Appear one or more times at location (freq="onePlus").
We append these two-part (type/freq) components together to search for patterns across corpora.
Some examples:
To find all instances of exactly "go home": cqlArr=list( list(type="word", item="go", freq="once"), list(type="word", item="home", freq="once"))
This matches all utterances containing: "go home"
To find all instances of any form of "go" followed by "home", we use type="lemma" for "go": cqlArr=list( list(type="lemma", item="go", freq="once"), list(type="word", item="home", freq="once"))
This matches all utterances containing: "go home" "goes home" "went home" "going home"
To find all instances of a subject pronoun, followd by any form of "go", followed by one or more adverbs, followed by "home": cqlArr=list( list(type="pos", item="pro:sub", freq="once"), list(type="lemma", item="go", freq="once"), list(type="pos", item="adv", freq="onePlus"), list(type="word", item="home", freq="once"))
This matches all utterances containing: "they went back home" "they go back home" "he went back home" "we went back home" others...
There are many "item" values for part of speech (type="pos"). See the CHAT manual or the CQL tab on TalkBankDB (https://talkbank.org/DB) for legal part-of-speech codes.
getCQL(
cqlArr = NULL,
corpusName = NULL,
corpora = NULL,
lang = NULL,
media = NULL,
age = NULL,
gender = NULL,
designType = NULL,
activityType = NULL,
groupType = NULL,
auth = FALSE
)
Query by grammatical pattern. For example, to search for all utterances where a speaker says "go" once followed by adverb occuring one or more times: cqlArr=list(list(type="word", item="go", freq="once"), list(type="pos", item="adv", freq="onePlus")). Legal values for type are: "word" to match exact word, "lemma" to match all forms of a word, "pos" to match parts of speech. Legal values for item are any word, word lemma, or part of speech code (see CHAT manual or the CQL tab on TalkBankDB (https://talkbank.org/DB) for legal part-of-speech codes). Legal values for freq are "once", "onePlus", and "zeroPlus".
Name of corpus to query. For example, to search within the childes corpus, corpus="childes".
Name of corpus/corporas to query. This is a path starting with the corpus name followed by subfolder names leading to a folder for which all transcripts beneath it will be queried. For example, to query all transcripts in the MacWhinney childes corpus: corpora = c('childes', 'Eng-NA', 'MacWhinney').
Query by language For example, to get transcripts that contain both English and Spanish: lang=c("eng", "spa"). Legal values: 3-letter language codes based on the ISO 639-3 standard.
Query by media type. For example, to get transcripts with an associated video recording: media=c("video"). Legal values: "audio" or "video".
Query by participant month age range. For example, to get transcripts with target participants who are 14-18 months old: age=c(from="3", to="12").
Query by participant gender. For example, to get transcripts with female target participants: gender=c("female"). Legal values: "female" or "male".
Query by design type. For example, to get transcripts from a longitudinal study: designType=c("long") Legal values are "long" for longitudinal studies, "cross" for cross-sectional studies.
Query by activity type. For example, to get transcripts where the target participant is engaged in toy play: activityType=c("toyplay"). See the CHAT manual for legal values.
Query by group type. For example, to get transcripts where the target participant is hearing limited: groupType=c("HL"). See the CHAT manual for legal values.
Determine if user should be prompted to authenticate in order to access protected collections. Defaults to False.
getCQL(cqlArr=list(list(type="lemma", item="my", freq="once"),
list(type="lemma", item="ball", freq="once")),
corpusName = 'childes',
corpora = c('childes', 'Eng-NA', 'MacWhinney'))
#> [1] "Fetching data, please wait..."
#> [1] "Success!"
#> role monthage docID uid who
#> 1 Target_Child 30 childes/Eng-NA/MacWhinney/020627 23 CHI
#> 2 Target_Child 36 childes/Eng-NA/MacWhinney/030018b 85 CHI
#> 3 Target_Child 44 childes/Eng-NA/MacWhinney/030805 147 CHI
#> 4 Target_Child 40 childes/Eng-NA/MacWhinney/030429 373 CHI
#> 5 Target_Child 40 childes/Eng-NA/MacWhinney/030429 373 CHI
#> 6 Target_Child 40 childes/Eng-NA/MacWhinney/030429 373 CHI
#> 7 Target_Child 40 childes/Eng-NA/MacWhinney/030429 373 CHI
#> 8 Target_Child 41 childes/Eng-NA/MacWhinney/030512 6 CHI
#> 9 Target_Child 41 childes/Eng-NA/MacWhinney/030512 6 CHI
#> 10 Target_Child 41 childes/Eng-NA/MacWhinney/030512 6 CHI
#> 11 Target_Child 41 childes/Eng-NA/MacWhinney/030512 6 CHI
#> 12 Target_Child 41 childes/Eng-NA/MacWhinney/030526a 249 CHI
#> 13 Target_Child 41 childes/Eng-NA/MacWhinney/030526a 249 CHI
#> 14 Target_Child 41 childes/Eng-NA/MacWhinney/030526a 249 CHI
#> 15 Target_Child 41 childes/Eng-NA/MacWhinney/030526a 249 CHI
#> utt filename
#> 1 I hafta get << my ball >> 020627
#> 2 but I forgot << my ball >> 030018b
#> 3 I magiced << my ball >> away 030805
#> 4 I threw << my ball >> over that fence next doors 030429
#> 5 I threw << my ball >> over that fence next doors 030429
#> 6 I threw << my ball >> over that fence next doors 030429
#> 7 I threw << my ball >> over that fence next doors 030429
#> 8 I throw << my ball >> back to their house 030512
#> 9 I throw << my ball >> back to their house 030512
#> 10 I throw << my ball >> back to their house 030512
#> 11 I throw << my ball >> back to their house 030512
#> 12 can he play with << my ball >> 030526a
#> 13 can he play with << my ball >> 030526a
#> 14 can he play with << my ball >> 030526a
#> 15 can he play with << my ball >> 030526a
#> path
#> 1 childes/Eng-NA/MacWhinney/020627
#> 2 childes/Eng-NA/MacWhinney/030018b
#> 3 childes/Eng-NA/MacWhinney/030805
#> 4 childes/Eng-NA/MacWhinney/030429
#> 5 childes/Eng-NA/MacWhinney/030429
#> 6 childes/Eng-NA/MacWhinney/030429
#> 7 childes/Eng-NA/MacWhinney/030429
#> 8 childes/Eng-NA/MacWhinney/030512
#> 9 childes/Eng-NA/MacWhinney/030512
#> 10 childes/Eng-NA/MacWhinney/030512
#> 11 childes/Eng-NA/MacWhinney/030512
#> 12 childes/Eng-NA/MacWhinney/030526a
#> 13 childes/Eng-NA/MacWhinney/030526a
#> 14 childes/Eng-NA/MacWhinney/030526a
#> 15 childes/Eng-NA/MacWhinney/030526a