Queries a table where each row represents a transcript.
# Get penglish-north american transcripts in in childes
transcripts <- getTranscripts(corpusName = 'childes',
corpora = c('childes', 'Eng-NA'))
#> [1] "Fetching data, please wait..."
#> [1] "Success!"
path | filename | languages | media | date | pid | designType | activityType | groupType |
---|---|---|---|---|---|---|---|---|
childes/Eng-NA/Bates/Free20/amy | amy | eng | NULL | NULL | 11312/c-00015218-1 | cross | toyplay | TD |
childes/Eng-NA/Bates/Free20/betty | betty | eng | NULL | NULL | 11312/c-00015219-1 | cross | toyplay | TD |
childes/Eng-NA/Bates/Free20/chuck | chuck | eng | NULL | NULL | 11312/c-00015220-1 | cross | toyplay | TD |
childes/Eng-NA/Bates/Free20/doug | doug | eng | NULL | NULL | 11312/c-00015221-1 | cross | toyplay | TD |
childes/Eng-NA/Bates/Free20/ed | ed | eng | NULL | NULL | 11312/c-00015222-1 | cross | toyplay | TD |
Queries a table where each row represents a participant (speaker) listed in a transcript.
# Get english-north american participants in childes
participants <- getParticipants(corpusName = 'childes',
corpora = c('childes',
'Eng-NA'))
#> [1] "Fetching data, please wait..."
#> [1] "Success!"
filename | path | who | name | role | language | monthage | age | sex | numwords | numutts | avgutt | medianutt |
---|---|---|---|---|---|---|---|---|---|---|---|---|
amy | childes/Eng-NA/Bates/Free20/amy | CHI | NULL | Target_Child | eng | 20 | 1;08.00 | female | 33 | 32 | 1.03125 | 1 |
amy | childes/Eng-NA/Bates/Free20/amy | MOT | NULL | Mother | eng | NULL | NULL | female | 220 | 80 | 2.75 | 3 |
betty | childes/Eng-NA/Bates/Free20/betty | MOT | NULL | Mother | eng | NULL | NULL | female | 354 | 93 | 3.806452 | 4 |
betty | childes/Eng-NA/Bates/Free20/betty | CHI | Betty | Target_Child | eng | 20 | 1;08.00 | female | 13 | 12 | 1.083333 | 1 |
chuck | childes/Eng-NA/Bates/Free20/chuck | CHI | Chuck | Target_Child | eng | 20 | 1;08.00 | male | 67 | 48 | 1.395833 | 1 |
Queries a table with all the words from the selected transcripts, one word (token) per row.
# Get tokens (words) from one transcript.
tokens <- getTokens(corpusName = 'childes',
corpora = c('childes',
'Eng-NA',
'MacWhinney',
'010411a'));
#> [1] "Fetching data, please wait..."
#> [1] "Success!"
filename | path | uid | wordnum | role | who | word | stem | pos |
---|---|---|---|---|---|---|---|---|
010411a | childes/Eng-NA/MacWhinney/010411a | 0 | 0 | Father | FAT | wanna | want | v |
010411a | childes/Eng-NA/MacWhinney/010411a | 0 | 1 | Father | FAT | give | give | v |
010411a | childes/Eng-NA/MacWhinney/010411a | 0 | 2 | Father | FAT | me | me | pro:obj |
010411a | childes/Eng-NA/MacWhinney/010411a | 0 | 3 | Father | FAT | a | a | det:art |
010411a | childes/Eng-NA/MacWhinney/010411a | 0 | 4 | Father | FAT | kiss | kiss | n |
Queries a table with all the words from the selected transcripts condensed into “types” based on word form and part of speech.
# Get token types from MacWhinney set.
token.types <- getTokenTypes(corpusName = 'childes',
corpora = c('childes',
'Eng-NA',
'MacWhinney'));
#> [1] "Fetching data, please wait..."
#> [1] "Success!"
role | word | count | pos | stem |
---|---|---|---|---|
Father | you | 14336 | pro:per | you |
Target_Child | I | 10430 | pro:sub | I |
Target_Child | and | 9653 | coord | and |
Target_Child | the | 8860 | det:art | the |
Father | the | 8525 | det:art | the |
Queries a table with all the words from the selected transcripts, one word (token) per row.
utterances <- getUtterances(corpusName = 'childes',
corpora = c('childes',
'Eng-NA',
'MacWhinney',
'010411a'))
#> [1] "Fetching data, please wait..."
#> [1] "Success!"
filename | path | utt_num | who | role | postcodes | gems | utterance | startTime | endTime | |
---|---|---|---|---|---|---|---|---|---|---|
10 | 010411a | childes/Eng-NA/MacWhinney/010411a | 9 | CHI | Target_Child | NULL | NULL | what’s that | 33.398 | 33.714 |
11 | 010411a | childes/Eng-NA/MacWhinney/010411a | 10 | FAT | Father | NULL | NULL | taperecorder over there | 33.714 | 34.884 |
12 | 010411a | childes/Eng-NA/MacWhinney/010411a | 11 | CHI | Target_Child | NULL | NULL | hm | 34.884 | 35.999 |
13 | 010411a | childes/Eng-NA/MacWhinney/010411a | 12 | FAT | Father | NULL | NULL | do you have some nice little things to say to it | 35.999 | 37.818 |
14 | 010411a | childes/Eng-NA/MacWhinney/010411a | 13 | CHI | Target_Child | NULL | NULL | hi | 37.818 | 38.394 |
Queries to get n-grams of specified size (n) and type.
# Get 3-grams of words from one transcript.
ngrams <- getNgrams(nGram=c("3", "word"),
corpusName = 'childes',
corpora = c('childes',
'Eng-NA',
'MacWhinney',
'010411a'));
#> [1] "Fetching data, please wait..."
#> [1] "Success!"
role | ngram | count |
---|---|---|
Target_Child | dad could we | 1 |
Target_Child | could we turn | 1 |
Target_Child | we turn that | 1 |
Target_Child | out out out | 2 |
Target_Child | them them them | 3 |
Queryting by “CQL” (Corpus Query Language) lets us search for patterns in the selected transcripts. We construct a CQL query by specifying a search pattern of words, lemmas, and parts of speech. see documentation (?getCQL) for details.
# Query for text pattern "my ball" as lemma in MacWhinney set.
cql.myball <- getCQL(cqlArr=list(list(type="lemma", item="my", freq="once"),
list(type="lemma", item="ball", freq="once")),
corpusName = 'childes',
corpora = c('childes', 'Eng-NA', 'MacWhinney'));
#> [1] "Fetching data, please wait..."
#> [1] "Success!"
role | monthage | docID | uid | who | utt | filename | path |
---|---|---|---|---|---|---|---|
Target_Child | 30 | childes/Eng-NA/MacWhinney/020627 | 23 | CHI | I hafta get << my ball >> | 020627 | childes/Eng-NA/MacWhinney/020627 |
Target_Child | 36 | childes/Eng-NA/MacWhinney/030018b | 85 | CHI | but I forgot << my ball >> | 030018b | childes/Eng-NA/MacWhinney/030018b |
Target_Child | 44 | childes/Eng-NA/MacWhinney/030805 | 147 | CHI | I magiced << my ball >> away | 030805 | childes/Eng-NA/MacWhinney/030805 |
Target_Child | 40 | childes/Eng-NA/MacWhinney/030429 | 373 | CHI | I threw << my ball >> over that fence next doors | 030429 | childes/Eng-NA/MacWhinney/030429 |
Target_Child | 40 | childes/Eng-NA/MacWhinney/030429 | 373 | CHI | I threw << my ball >> over that fence next doors | 030429 | childes/Eng-NA/MacWhinney/030429 |