Queryting by "CQL" (Corpus Query Language) lets us search for patterns in the selected transcripts. We construct a CQL query by specifying a search pattern of words, lemmas, and parts of speech.

The "cqlArr" parameter specifies a pattern to search for in text. The pattern is built up by appending components that are one of three types:

  • Exact word match (type="word").

  • Match any form of a word (type="lemma").

  • Or part of speech (type="pos").

Along with type, components have another value, "freq", specifying how many times an item should appear at that location.

  • Appear once at that location (freq="once").

  • Appear zero or more times at location (freq="zeroPlus").

  • Appear one or more times at location (freq="onePlus").

We append these two-part (type/freq) components together to search for patterns across corpora.

Some examples:

  • To find all instances of exactly "go home": cqlArr=list( list(type="word", item="go", freq="once"), list(type="word", item="home", freq="once"))

This matches all utterances containing: "go home"

  • To find all instances of any form of "go" followed by "home", we use type="lemma" for "go": cqlArr=list( list(type="lemma", item="go", freq="once"), list(type="word", item="home", freq="once"))

This matches all utterances containing: "go home" "goes home" "went home" "going home"

  • To find all instances of a subject pronoun, followd by any form of "go", followed by one or more adverbs, followed by "home": cqlArr=list( list(type="pos", item="pro:sub", freq="once"), list(type="lemma", item="go", freq="once"), list(type="pos", item="adv", freq="onePlus"), list(type="word", item="home", freq="once"))

This matches all utterances containing: "they went back home" "they go back home" "he went back home" "we went back home" others...

There are many "item" values for part of speech (type="pos"). See the CHAT manual or the CQL tab on TalkBankDB (https://talkbank.org/DB) for legal part-of-speech codes.

getCQL(
  cqlArr = NULL,
  corpusName = NULL,
  corpora = NULL,
  lang = NULL,
  media = NULL,
  age = NULL,
  gender = NULL,
  designType = NULL,
  activityType = NULL,
  groupType = NULL,
  auth = FALSE
)

Arguments

cqlArr

Query by grammatical pattern. For example, to search for all utterances where a speaker says "go" once followed by adverb occuring one or more times: cqlArr=list(list(type="word", item="go", freq="once"), list(type="pos", item="adv", freq="onePlus")). Legal values for type are: "word" to match exact word, "lemma" to match all forms of a word, "pos" to match parts of speech. Legal values for item are any word, word lemma, or part of speech code (see CHAT manual or the CQL tab on TalkBankDB (https://talkbank.org/DB) for legal part-of-speech codes). Legal values for freq are "once", "onePlus", and "zeroPlus".

corpusName

Name of corpus to query. For example, to search within the childes corpus, corpus="childes".

corpora

Name of corpus/corporas to query. This is a path starting with the corpus name followed by subfolder names leading to a folder for which all transcripts beneath it will be queried. For example, to query all transcripts in the MacWhinney childes corpus: corpora = c('childes', 'Eng-NA', 'MacWhinney').

lang

Query by language For example, to get transcripts that contain both English and Spanish: lang=c("eng", "spa"). Legal values: 3-letter language codes based on the ISO 639-3 standard.

media

Query by media type. For example, to get transcripts with an associated video recording: media=c("video"). Legal values: "audio" or "video".

age

Query by participant month age range. For example, to get transcripts with target participants who are 14-18 months old: age=c(from="3", to="12").

gender

Query by participant gender. For example, to get transcripts with female target participants: gender=c("female"). Legal values: "female" or "male".

designType

Query by design type. For example, to get transcripts from a longitudinal study: designType=c("long") Legal values are "long" for longitudinal studies, "cross" for cross-sectional studies.

activityType

Query by activity type. For example, to get transcripts where the target participant is engaged in toy play: activityType=c("toyplay"). See the CHAT manual for legal values.

groupType

Query by group type. For example, to get transcripts where the target participant is hearing limited: groupType=c("HL"). See the CHAT manual for legal values.

auth

Determine if user should be prompted to authenticate in order to access protected collections. Defaults to False.

Examples

getCQL(cqlArr=list(list(type="lemma", item="my", freq="once"), list(type="lemma", item="ball", freq="once")), corpusName = 'childes', corpora = c('childes', 'Eng-NA', 'MacWhinney'))
#> [1] "Fetching data, please wait..." #> [1] "Success!"
#> role monthage docid uid who #> 1 Target_Child 30 19742 23 CHI #> 2 Target_Child 36 19786 85 CHI #> 3 Target_Child 40 19811 373 CHI #> 4 Target_Child 41 19812 6 CHI #> 5 Target_Child 41 19815 249 CHI #> 6 Target_Child 44 19829 147 CHI #> utt filename #> 1 I hafta get << my ball >> 020627 #> 2 but I forgot << my ball >> 030018b #> 3 I threw << my ball >> over that fence next doors 030429 #> 4 I throw << my ball >> back to their house 030512 #> 5 can he play with << my ball >> 030526a #> 6 I magiced << my ball >> away 030805 #> path #> 1 childes/Eng-NA/MacWhinney/020627 #> 2 childes/Eng-NA/MacWhinney/030018b #> 3 childes/Eng-NA/MacWhinney/030429 #> 4 childes/Eng-NA/MacWhinney/030512 #> 5 childes/Eng-NA/MacWhinney/030526a #> 6 childes/Eng-NA/MacWhinney/030805