A glimpse at each TBDBr query function

getTranscripts

Queries a table where each row represents a transcript.

  • Link to view transcript and play any associated media.
  • Corpus path to transcript.
  • Media types (audio/video) linked to transcript.
  • Unique ID for transcript (PID).
  • Languages spoken. Date recorded.
  • Design Type.
  • Activity Type.
  • Group Type.
# Get penglish-north american transcripts in in childes 
 transcripts <- getTranscripts(corpusName = 'childes',
                               corpora = c('childes', 'Eng-NA'))
#> [1] "Fetching data, please wait..."
#> [1] "Success!"
path filename languages media date pid designType activityType groupType
childes/Eng-NA/Bates/Free20/amy amy eng NULL NULL 11312/c-00015218-1 cross toyplay TD
childes/Eng-NA/Bates/Free20/betty betty eng NULL NULL 11312/c-00015219-1 cross toyplay TD
childes/Eng-NA/Bates/Free20/chuck chuck eng NULL NULL 11312/c-00015220-1 cross toyplay TD
childes/Eng-NA/Bates/Free20/doug doug eng NULL NULL 11312/c-00015221-1 cross toyplay TD
childes/Eng-NA/Bates/Free20/ed ed eng NULL NULL 11312/c-00015222-1 cross toyplay TD

getParticipants

Queries a table where each row represents a participant (speaker) listed in a transcript.

  • Link to view transcript and play any associated media.
  • Corpus path to transcript.
  • Speaker’s ID.
  • Speaker’s name.
  • Speaker’s role.
  • Speaker’s language.
  • Speaker’s age in months.
  • Speaker’s age in Years/Months/Days.
  • Speaker’s gender.
  • Number of words spoken by speaker.
  • Number of utterances spoken by speaker.
  • Average number of words per speaker’s utterance.
  • Median number of words per speaker’s utterance.
# Get english-north american participants in childes 
 participants <- getParticipants(corpusName = 'childes',
                                 corpora = c('childes',
                                             'Eng-NA'))
#> [1] "Fetching data, please wait..."
#> [1] "Success!"
filename path who name role language monthage age sex numwords numutts avgutt medianutt
amy childes/Eng-NA/Bates/Free20/amy MOT NULL Mother eng NULL NULL female 220 80 2.75 3
amy childes/Eng-NA/Bates/Free20/amy CHI NULL Target_Child eng 20 1;08.00 female 33 32 1.03125 1
betty childes/Eng-NA/Bates/Free20/betty CHI Betty Target_Child eng 20 1;08.00 female 13 12 1.08333 1
betty childes/Eng-NA/Bates/Free20/betty MOT NULL Mother eng NULL NULL female 354 93 3.80645 4
chuck childes/Eng-NA/Bates/Free20/chuck CHI Chuck Target_Child eng 20 1;08.00 male 67 48 1.39583 1

getTokens

Queries a table with all the words from the selected transcripts, one word (token) per row.

  • Link to view transcript and play any associated media.
  • Corpus path to transcript.
  • Utterance sequence number (starts at 0).
  • Word sequence number within utterance (starts at 0).
  • Speaker’s role.
  • Speaker’s ID.
  • The word (token).
  • The word’s stem.
  • Part of speech code. (See CHAT manual for descriptions of codes).
# Get tokens (words) from one transcript.
tokens <- getTokens(corpusName = 'childes',
                    corpora = c('childes',
                                'Eng-NA',
                                'MacWhinney',
                                '010411a'));
#> [1] "Fetching data, please wait..."
#> [1] "Success!"
filename path uid wordnum role who word stem pos
010411a childes/Eng-NA/MacWhinney/010411a 0 0 Father FAT wanna want v
010411a childes/Eng-NA/MacWhinney/010411a 0 1 Father FAT give give v
010411a childes/Eng-NA/MacWhinney/010411a 0 2 Father FAT me me pro:obj
010411a childes/Eng-NA/MacWhinney/010411a 0 3 Father FAT a a det:art
010411a childes/Eng-NA/MacWhinney/010411a 0 4 Father FAT kiss kiss n

getTokenTypes

Queries a table with all the words from the selected transcripts condensed into “types” based on word form and part of speech.

  • Speaker’s role.
  • The word. Number of occurances of word in selected transcripts.
  • Part of speech (See CHAT manual for descriptions of codes).
  • The word’s stem.
# Get token types from MacWhinney set.
token.types <- getTokenTypes(corpusName = 'childes',
                             corpora = c('childes',
                                         'Eng-NA',
                                         'MacWhinney'));
#> [1] "Fetching data, please wait..."
#> [1] "Success!"
role word count pos stem
Target_Child nice 138 adj nice
Target_Child fiu 1 L2 fiu
Target_Child tape 123 n tape
Target_Child tape 1 meta tape
Target_Child tape 5 v tape

getUtterances

Queries a table with all the words from the selected transcripts, one word (token) per row.

  • Link to view transcript and play any associated media.
  • Corpus path to transcript.
  • Utterance sequence number (starts at 0).
  • Word sequence number within utterance (starts at 0).
  • Speaker’s ID.
  • Speaker’s role.
  • Utterance postcodes.
  • Utterance GEMS.
  • Utterance.
  • Start time of utterance in associated media.
  • End time of utterance in associated media.
utterances <- getUtterances(corpusName = 'childes',
                            corpora = c('childes',
                                        'Eng-NA',
                                        'MacWhinney',
                                        '010411a'))
#> [1] "Fetching data, please wait..."
#> [1] "Success!"
filename path uid who role postcodes gems utterance startTime endTime
10 010411a childes/Eng-NA/MacWhinney/010411a 9 CHI Target_Child NULL NULL what’s that 33.398 33.714
11 010411a childes/Eng-NA/MacWhinney/010411a 10 FAT Father NULL NULL taperecorder over there 33.714 34.884
12 010411a childes/Eng-NA/MacWhinney/010411a 11 CHI Target_Child NULL NULL hm 34.884 35.999
13 010411a childes/Eng-NA/MacWhinney/010411a 12 FAT Father NULL NULL do you have some nice little things to say to it 35.999 37.818
14 010411a childes/Eng-NA/MacWhinney/010411a 13 CHI Target_Child NULL NULL hi 37.818 38.394

getNgrams

Queries to get n-grams of specified size (n) and type.

  • Speaker’s role.
  • The n-gram (word, stem, or part-of-speech). See CHAT manual for part-of-speech code values.
  • Frequency count of n-gram.
# Get 3-grams of words from one transcript.
ngrams <- getNgrams(nGram=c("3", "word"),
                    corpusName = 'childes',
                    corpora = c('childes',
                                'Eng-NA',
                                'MacWhinney',
                                '010411a'));
#> [1] "Fetching data, please wait..."
#> [1] "Success!"
role ngram count
Target_Child dad could we 1
Target_Child could we turn 1
Target_Child we turn that 1
Target_Child out out out 2
Target_Child them them them 3

getCQL

Queryting by “CQL” (Corpus Query Language) lets us search for patterns in the selected transcripts. We construct a CQL query by specifying a search pattern of words, lemmas, and parts of speech. see documentation (?getCQL) for details.

# Query for text pattern "my ball" as lemma in MacWhinney set.
cql.myball <- getCQL(cqlArr=list(list(type="lemma", item="my", freq="once"),
                                 list(type="lemma", item="ball", freq="once")), 
                     corpusName = 'childes',
                     corpora = c('childes', 'Eng-NA', 'MacWhinney'));
#> [1] "Fetching data, please wait..."
#> [1] "Success!"
role monthage docid uid who utt filename path
Target_Child 30 19742 23 CHI I hafta get << my ball >> 020627 childes/Eng-NA/MacWhinney/020627
Target_Child 36 19786 85 CHI but I forgot << my ball >> 030018b childes/Eng-NA/MacWhinney/030018b
Target_Child 40 19811 373 CHI I threw << my ball >> over that fence next doors 030429 childes/Eng-NA/MacWhinney/030429
Target_Child 41 19812 6 CHI I throw << my ball >> back to their house 030512 childes/Eng-NA/MacWhinney/030512
Target_Child 41 19815 249 CHI can he play with << my ball >> 030526a childes/Eng-NA/MacWhinney/030526a