Queryting by "CQL" (Corpus Query Language) lets us search for patterns in the selected transcripts. We construct a CQL query by specifying a search pattern of words, lemmas, and parts of speech.

The "cqlArr" parameter specifies a pattern to search for in text. The pattern is built up by appending components that are one of three types:

  • Exact word match (type="word").

  • Match any form of a word (type="lemma").

  • Or part of speech (type="pos").

Along with type, components have another value, "freq", specifying how many times an item should appear at that location.

  • Appear once at that location (freq="once").

  • Appear zero or more times at location (freq="zeroPlus").

  • Appear one or more times at location (freq="onePlus").

We append these two-part (type/freq) components together to search for patterns across corpora.

Some examples:

  • To find all instances of exactly "go home": cqlArr=list( list(type="word", item="go", freq="once"), list(type="word", item="home", freq="once"))

This matches all utterances containing: "go home"

  • To find all instances of any form of "go" followed by "home", we use type="lemma" for "go": cqlArr=list( list(type="lemma", item="go", freq="once"), list(type="word", item="home", freq="once"))

This matches all utterances containing: "go home" "goes home" "went home" "going home"

  • To find all instances of a subject pronoun, followd by any form of "go", followed by one or more adverbs, followed by "home": cqlArr=list( list(type="pos", item="pro:sub", freq="once"), list(type="lemma", item="go", freq="once"), list(type="pos", item="adv", freq="onePlus"), list(type="word", item="home", freq="once"))

This matches all utterances containing: "they went back home" "they go back home" "he went back home" "we went back home" others...

There are many "item" values for part of speech (type="pos"). See the CHAT manual or the CQL tab on TalkBankDB (https://talkbank.org/DB) for legal part-of-speech codes.

getCQL(
  cqlArr = NULL,
  corpusName = NULL,
  corpora = NULL,
  lang = NULL,
  media = NULL,
  age = NULL,
  gender = NULL,
  designType = NULL,
  activityType = NULL,
  groupType = NULL,
  auth = FALSE
)

Arguments

cqlArr

Query by grammatical pattern. For example, to search for all utterances where a speaker says "go" once followed by adverb occuring one or more times: cqlArr=list(list(type="word", item="go", freq="once"), list(type="pos", item="adv", freq="onePlus")). Legal values for type are: "word" to match exact word, "lemma" to match all forms of a word, "pos" to match parts of speech. Legal values for item are any word, word lemma, or part of speech code (see CHAT manual or the CQL tab on TalkBankDB (https://talkbank.org/DB) for legal part-of-speech codes). Legal values for freq are "once", "onePlus", and "zeroPlus".

corpusName

Name of corpus to query. For example, to search within the childes corpus, corpus="childes".

corpora

Name of corpus/corporas to query. This is a path starting with the corpus name followed by subfolder names leading to a folder for which all transcripts beneath it will be queried. For example, to query all transcripts in the MacWhinney childes corpus: corpora = c('childes', 'Eng-NA', 'MacWhinney').

lang

Query by language For example, to get transcripts that contain both English and Spanish: lang=c("eng", "spa"). Legal values: 3-letter language codes based on the ISO 639-3 standard.

media

Query by media type. For example, to get transcripts with an associated video recording: media=c("video"). Legal values: "audio" or "video".

age

Query by participant month age range. For example, to get transcripts with target participants who are 14-18 months old: age=c(from="3", to="12").

gender

Query by participant gender. For example, to get transcripts with female target participants: gender=c("female"). Legal values: "female" or "male".

designType

Query by design type. For example, to get transcripts from a longitudinal study: designType=c("long") Legal values are "long" for longitudinal studies, "cross" for cross-sectional studies.

activityType

Query by activity type. For example, to get transcripts where the target participant is engaged in toy play: activityType=c("toyplay"). See the CHAT manual for legal values.

groupType

Query by group type. For example, to get transcripts where the target participant is hearing limited: groupType=c("HL"). See the CHAT manual for legal values.

auth

Determine if user should be prompted to authenticate in order to access protected collections. Defaults to False.

Examples

getCQL(cqlArr=list(list(type="lemma", item="my", freq="once"),
                   list(type="lemma", item="ball", freq="once")),
       corpusName = 'childes',
       corpora = c('childes', 'Eng-NA', 'MacWhinney'))
#> [1] "Fetching data, please wait..."
#> [1] "Success!"
#>            role monthage                             docID uid who
#> 1  Target_Child       30  childes/Eng-NA/MacWhinney/020627  23 CHI
#> 2  Target_Child       36 childes/Eng-NA/MacWhinney/030018b  85 CHI
#> 3  Target_Child       44  childes/Eng-NA/MacWhinney/030805 147 CHI
#> 4  Target_Child       40  childes/Eng-NA/MacWhinney/030429 373 CHI
#> 5  Target_Child       40  childes/Eng-NA/MacWhinney/030429 373 CHI
#> 6  Target_Child       40  childes/Eng-NA/MacWhinney/030429 373 CHI
#> 7  Target_Child       40  childes/Eng-NA/MacWhinney/030429 373 CHI
#> 8  Target_Child       41  childes/Eng-NA/MacWhinney/030512   6 CHI
#> 9  Target_Child       41  childes/Eng-NA/MacWhinney/030512   6 CHI
#> 10 Target_Child       41  childes/Eng-NA/MacWhinney/030512   6 CHI
#> 11 Target_Child       41  childes/Eng-NA/MacWhinney/030512   6 CHI
#> 12 Target_Child       41 childes/Eng-NA/MacWhinney/030526a 249 CHI
#> 13 Target_Child       41 childes/Eng-NA/MacWhinney/030526a 249 CHI
#> 14 Target_Child       41 childes/Eng-NA/MacWhinney/030526a 249 CHI
#> 15 Target_Child       41 childes/Eng-NA/MacWhinney/030526a 249 CHI
#>                                                  utt filename
#> 1                         I hafta get << my ball >>    020627
#> 2                        but I forgot << my ball >>   030018b
#> 3                      I magiced << my ball >> away    030805
#> 4  I threw << my ball >> over that fence next doors    030429
#> 5  I threw << my ball >> over that fence next doors    030429
#> 6  I threw << my ball >> over that fence next doors    030429
#> 7  I threw << my ball >> over that fence next doors    030429
#> 8         I throw << my ball >> back to their house    030512
#> 9         I throw << my ball >> back to their house    030512
#> 10        I throw << my ball >> back to their house    030512
#> 11        I throw << my ball >> back to their house    030512
#> 12                   can he play with << my ball >>   030526a
#> 13                   can he play with << my ball >>   030526a
#> 14                   can he play with << my ball >>   030526a
#> 15                   can he play with << my ball >>   030526a
#>                                 path
#> 1   childes/Eng-NA/MacWhinney/020627
#> 2  childes/Eng-NA/MacWhinney/030018b
#> 3   childes/Eng-NA/MacWhinney/030805
#> 4   childes/Eng-NA/MacWhinney/030429
#> 5   childes/Eng-NA/MacWhinney/030429
#> 6   childes/Eng-NA/MacWhinney/030429
#> 7   childes/Eng-NA/MacWhinney/030429
#> 8   childes/Eng-NA/MacWhinney/030512
#> 9   childes/Eng-NA/MacWhinney/030512
#> 10  childes/Eng-NA/MacWhinney/030512
#> 11  childes/Eng-NA/MacWhinney/030512
#> 12 childes/Eng-NA/MacWhinney/030526a
#> 13 childes/Eng-NA/MacWhinney/030526a
#> 14 childes/Eng-NA/MacWhinney/030526a
#> 15 childes/Eng-NA/MacWhinney/030526a