Example Queries

getTranscripts

Queries a table where each row represents a transcript.

Link to view transcript and play any associated media.
Corpus path to transcript.
Media types (audio/video) linked to transcript.
Unique ID for transcript (PID).
Languages spoken. Date recorded.
Design Type.
Activity Type.
Group Type.

# Get penglish-north american transcripts in in childes 
 transcripts <- getTranscripts(corpusName = 'childes',
                               corpora = c('childes', 'Eng-NA'))
#> [1] "Fetching data, please wait..."
#> [1] "Success!"

path	filename	languages	media	date	pid	designType	activityType	groupType
childes/Eng-NA/Bates/Free20/amy	amy	eng	NULL	NULL	11312/c-00015218-1	cross	toyplay	TD
childes/Eng-NA/Bates/Free20/betty	betty	eng	NULL	NULL	11312/c-00015219-1	cross	toyplay	TD
childes/Eng-NA/Bates/Free20/chuck	chuck	eng	NULL	NULL	11312/c-00015220-1	cross	toyplay	TD
childes/Eng-NA/Bates/Free20/doug	doug	eng	NULL	NULL	11312/c-00015221-1	cross	toyplay	TD
childes/Eng-NA/Bates/Free20/ed	ed	eng	NULL	NULL	11312/c-00015222-1	cross	toyplay	TD

getParticipants

Queries a table where each row represents a participant (speaker) listed in a transcript.

Link to view transcript and play any associated media.
Corpus path to transcript.
Speaker’s ID.
Speaker’s name.
Speaker’s role.
Speaker’s language.
Speaker’s age in months.
Speaker’s age in Years/Months/Days.
Speaker’s gender.
Number of words spoken by speaker.
Number of utterances spoken by speaker.
Average number of words per speaker’s utterance.
Median number of words per speaker’s utterance.

# Get english-north american participants in childes 
 participants <- getParticipants(corpusName = 'childes',
                                 corpora = c('childes',
                                             'Eng-NA'))
#> [1] "Fetching data, please wait..."
#> [1] "Success!"

filename	path	who	name	role	language	monthage	age	sex	numwords	numutts	avgutt	medianutt
amy	childes/Eng-NA/Bates/Free20/amy	CHI	NULL	Target_Child	eng	20	1;08.00	female	33	32	1.03125	1
amy	childes/Eng-NA/Bates/Free20/amy	MOT	NULL	Mother	eng	NULL	NULL	female	220	80	2.75	3
betty	childes/Eng-NA/Bates/Free20/betty	MOT	NULL	Mother	eng	NULL	NULL	female	354	93	3.806452	4
betty	childes/Eng-NA/Bates/Free20/betty	CHI	Betty	Target_Child	eng	20	1;08.00	female	13	12	1.083333	1
chuck	childes/Eng-NA/Bates/Free20/chuck	CHI	Chuck	Target_Child	eng	20	1;08.00	male	67	48	1.395833	1

getTokens

Queries a table with all the words from the selected transcripts, one word (token) per row.

Link to view transcript and play any associated media.
Corpus path to transcript.
Utterance sequence number (starts at 0).
Word sequence number within utterance (starts at 0).
Speaker’s role.
Speaker’s ID.
The word (token).
The word’s stem.
Part of speech code. (See CHAT manual for descriptions of codes).

# Get tokens (words) from one transcript.
tokens <- getTokens(corpusName = 'childes',
                    corpora = c('childes',
                                'Eng-NA',
                                'MacWhinney',
                                '010411a'));
#> [1] "Fetching data, please wait..."
#> [1] "Success!"

filename	path	wordnum	role	who	word	stem	pos
010411a	childes/Eng-NA/MacWhinney/010411a	0	Father	FAT	wanna	want	v
010411a	childes/Eng-NA/MacWhinney/010411a	1	Father	FAT	give	give	v
010411a	childes/Eng-NA/MacWhinney/010411a	2	Father	FAT	me	me	pro:obj
010411a	childes/Eng-NA/MacWhinney/010411a	3	Father	FAT	a	a	det:art
010411a	childes/Eng-NA/MacWhinney/010411a	4	Father	FAT	kiss	kiss	n

getTokenTypes

Queries a table with all the words from the selected transcripts condensed into “types” based on word form and part of speech.

Speaker’s role.
The word. Number of occurances of word in selected transcripts.
Part of speech (See CHAT manual for descriptions of codes).
The word’s stem.

# Get token types from MacWhinney set.
token.types <- getTokenTypes(corpusName = 'childes',
                             corpora = c('childes',
                                         'Eng-NA',
                                         'MacWhinney'));
#> [1] "Fetching data, please wait..."
#> [1] "Success!"

role	word	count	pos	stem
Father	you	14336	pro:per	you
Target_Child	I	10430	pro:sub	I
Target_Child	and	9653	coord	and
Target_Child	the	8860	det:art	the
Father	the	8525	det:art	the

getUtterances

Queries a table with all the words from the selected transcripts, one word (token) per row.

Link to view transcript and play any associated media.
Corpus path to transcript.
Utterance sequence number (starts at 0).
Word sequence number within utterance (starts at 0).
Speaker’s ID.
Speaker’s role.
Utterance postcodes.
Utterance GEMS.
Utterance.
Start time of utterance in associated media.
End time of utterance in associated media.

utterances <- getUtterances(corpusName = 'childes',
                            corpora = c('childes',
                                        'Eng-NA',
                                        'MacWhinney',
                                        '010411a'))
#> [1] "Fetching data, please wait..."
#> [1] "Success!"

	filename	path	utt_num	who	role	postcodes	gems	utterance	startTime	endTime
10	010411a	childes/Eng-NA/MacWhinney/010411a	9	CHI	Target_Child	NULL	NULL	what’s that	33.398	33.714
11	010411a	childes/Eng-NA/MacWhinney/010411a	10	FAT	Father	NULL	NULL	taperecorder over there	33.714	34.884
12	010411a	childes/Eng-NA/MacWhinney/010411a	11	CHI	Target_Child	NULL	NULL	hm	34.884	35.999
13	010411a	childes/Eng-NA/MacWhinney/010411a	12	FAT	Father	NULL	NULL	do you have some nice little things to say to it	35.999	37.818
14	010411a	childes/Eng-NA/MacWhinney/010411a	13	CHI	Target_Child	NULL	NULL	hi	37.818	38.394

getNgrams

Queries to get n-grams of specified size (n) and type.

Speaker’s role.
The n-gram (word, stem, or part-of-speech). See CHAT manual for part-of-speech code values.
Frequency count of n-gram.

# Get 3-grams of words from one transcript.
ngrams <- getNgrams(nGram=c("3", "word"),
                    corpusName = 'childes',
                    corpora = c('childes',
                                'Eng-NA',
                                'MacWhinney',
                                '010411a'));
#> [1] "Fetching data, please wait..."
#> [1] "Success!"

role	ngram	count
Target_Child	dad could we	1
Target_Child	could we turn	1
Target_Child	we turn that	1
Target_Child	out out out	2
Target_Child	them them them	3

getCQL

Queryting by “CQL” (Corpus Query Language) lets us search for patterns in the selected transcripts. We construct a CQL query by specifying a search pattern of words, lemmas, and parts of speech. see documentation (?getCQL) for details.

# Query for text pattern "my ball" as lemma in MacWhinney set.
cql.myball <- getCQL(cqlArr=list(list(type="lemma", item="my", freq="once"),
                                 list(type="lemma", item="ball", freq="once")), 
                     corpusName = 'childes',
                     corpora = c('childes', 'Eng-NA', 'MacWhinney'));
#> [1] "Fetching data, please wait..."
#> [1] "Success!"

role	monthage	docID	uid	who	utt	filename	path
Target_Child	30	childes/Eng-NA/MacWhinney/020627	23	CHI	I hafta get << my ball >>	020627	childes/Eng-NA/MacWhinney/020627
Target_Child	36	childes/Eng-NA/MacWhinney/030018b	85	CHI	but I forgot << my ball >>	030018b	childes/Eng-NA/MacWhinney/030018b
Target_Child	44	childes/Eng-NA/MacWhinney/030805	147	CHI	I magiced << my ball >> away	030805	childes/Eng-NA/MacWhinney/030805
Target_Child	40	childes/Eng-NA/MacWhinney/030429	373	CHI	I threw << my ball >> over that fence next doors	030429	childes/Eng-NA/MacWhinney/030429
Target_Child	40	childes/Eng-NA/MacWhinney/030429	373	CHI	I threw << my ball >> over that fence next doors	030429	childes/Eng-NA/MacWhinney/030429