Poster presented at CUNY 2015 (PDF)
Informativity in adaptation: Supervised and unsupervised learning of linguistic cue distributions
- Do people use informative lables during adaptation?
- Don't stop learning language as an adult: need to learn, or adapt to, the language produced by every new talker you meet.
- Adapting to unusual productions is a lot easier when you know from other cues what the talker meant to say (labeled) than when you are uncertaint about what the intended category was (unlabeled)
- But no studies have directly compared labeled and unlabeled adaptation
Background
- Categories (/b/ and /p/) are distributions of cues (VOT, f0, etc.)
- Distributional learning:
- Acquisition: learn language distributions
- Adaptation: learn talker's distributions
- Same underlying process?
- Acquisition is slow and hard, adaptation fast and easy. Why?
- Labels: lots of information from context (visual, lexical, etc.) that labels cues for listener.
- Do listeners actually use labels for adaptation when they're provided? Hasn't been a good test.
Methods
Preliminaries
| |
Set things up and load data | library(knitr)
knitr::opts_chunk$set(cache=TRUE,
autodep=TRUE,
dev=c('png', 'pdf', 'svg'),
fig.retina=2,
warning=FALSE,
message=FALSE)
library(devtools)
if (! require('supunsup', quietly=TRUE)) {
devtools::install_github('kleinschmidt/phonetic-sup-unsup')
require('supunsup')
}
# pre-parsed + excluded data from package
dat <- supunsup::supunsup_clean
|
Fit a regression model. | library(lme4)
library(dplyr)
dat_mod <- dat %>%
filter(trialSupCond == 'unsupervised' | supCond == 'unsupervised') %>%
mutate_for_lmer()
dat_fit <- glmer(respP ~ trial.s * vot_rel.s * supCond * bvotCond +
(trial.s * vot_rel.s | subject),
data = dat_mod,
family = 'binomial',
control = glmerControl(optimizer = 'bobyqa'))
|
Lay the groundwork for visualization. | library(ggplot2)
theme_set(theme_bw())
four_colors <-
c("#255984", # blue
"#CC8A2E", # yellow
"#CC5B2E", # red-orange
"#208C5E") # green
four_colors_saturated <-
c("#1461A1",
"#F99710",
"#F95210",
"#0BAB66")
scale_color_discrete <- function(...) {
scale_color_manual(values = four_colors_saturated, ...)
}
scale_fill_discrete <- function(...) {
scale_fill_manual(values = four_colors_saturated, ...)
}
# size for three-panel results figures
res_w <- 11
res_h <- 5
# formatting boilerplate:
format_results_plot <- function(p) {
p +
scale_color_discrete('Shift (ms)', drop=FALSE) +
scale_fill_discrete('Shift (ms)', drop=FALSE) +
scale_x_continuous('VOT (ms)', breaks=seq(-20, 80, by=20)) +
scale_y_continuous('Proportion /p/ response') +
scale_linetype_discrete('Condition')
}
plot_category_bounds <- function(cat_bounds, dodge_w = 0.75) {
ggplot(cat_bounds,
aes(x=factor(bvotCond, levels=rev(levels(bvotCond))),
y=boundary_vot,
ymin=boundary_vot - 1.96*boundary_vot_se,
ymax=boundary_vot + 1.96*boundary_vot_se,
color=bvotCond,
linetype=factor(supCond,
levels=c('unsupervised', 'supervised', 'mixed')),
group=paste(shift, supCond))) +
geom_pointrange(size=1.5, position=position_dodge(w=dodge_w)) +
geom_point(aes(y=boundary_vot_true), shape=1, size=6) +
scale_color_discrete(drop=FALSE) +
scale_linetype_manual(drop=FALSE,
values = c(1, 2, 3)) +
theme(legend.position='none') +
scale_x_discrete('Shift (ms VOT)') +
scale_y_continuous('/b/-/p/ boundary (ms VOT)',
breaks = seq(10, 60, by=5)) +
coord_flip()
}
|
From the fitted model, generate predictions to visualize model fits measure fitted category boundaries. | add_experiment <- function(data_) {
data_ %>%
mutate(experiment =
ifelse(supCond == 'mixed', 'Experiment 4',
ifelse(bvotCond == 20, 'Experiment 2',
ifelse(bvotCond == 30, 'Experiment 3',
'Experiment 1'))))
}
dat_pred <- make_prediction_data(dat, dat_mod) %>% add_experiment
# raw average respond-P probability
respP_by_thirds <- dat %>%
mutate(thirds=ntile(trial, 3)) %>%
select(-trial) %>%
left_join(bin_trials(dat)) %>%
group_by(supCond, trialSupCond, trial_range, bvotCond, vot) %>%
summarise(respP = mean(respP)) %>%
add_experiment
respP_by_thirds_unlab <- respP_by_thirds %>%
mutate(type='data') %>% # for plotting along w/ glmer fits
filter(trialSupCond == 'unsupervised')
cat_bounds <- category_boundaries(dat_mod, dat_fit) %>% add_experiment
|
Methods
| |
Subjects | n_subj <- supunsup::supunsup %>%
group_by(subject) %>%
summarise %>%
nrow
n_subj_excluded <- supunsup::supunsup_excluded %>%
group_by(subject) %>%
summarise %>%
nrow
|
We ran 368 subjects on Mechanical Turk, excluded 26 for chance performance, and had data from 342 for analysis. There were about 29 subjects in each condition: | supunsup::supunsup_clean %>%
group_by(supCond, bvotCond, subject) %>%
summarise %>%
tally %>%
kable(caption = 'Subjects in each condition')
|
supCond |
bvotCond |
n |
mixed |
0 |
30 |
mixed |
10 |
28 |
mixed |
20 |
29 |
mixed |
30 |
29 |
supervised |
0 |
31 |
supervised |
10 |
30 |
supervised |
20 |
27 |
supervised |
30 |
29 |
unsupervised |
0 |
26 |
unsupervised |
10 |
27 |
unsupervised |
20 |
30 |
unsupervised |
30 |
29 |
Procedure
On each trial subjects heard a spoken b/p minimal pair word (beach/peach, bees/peas, beak/peak), clicked on the matching picture to indicate the word they heard. On labeled trials, only one picture could match (e.g., bees and peach). On unlabeled trials, both members of the minimal pair were present (e.g., beach and peach). Subjects hears 222 trials, with the mixture of VOT values and labeled/unlabeled trials determined by the condition they were randomly assigned to.
Conditions and stimulus distributions
Shift conditions
| |

| dat %>%
group_by(bvotCond) %>%
filter(subject == first(subject)) %>%
group_by(bvotCond, vot) %>%
tally() %>%
ggplot(aes(x=vot, y=n, fill=factor(bvotCond))) +
geom_bar(stat='identity') +
facet_grid(.~bvotCond) +
scale_x_continuous('VOT (ms)', breaks=seq(-20, 80, by=20))
|
Supervision conditions
| |

| dat %>%
filter(bvotCond == 0) %>%
group_by(supCond) %>%
filter(subject == first(subject)) %>%
group_by(supCond, labeled, vot) %>%
tally %>%
ggplot(aes(x=vot, y=n, fill=labeled)) +
geom_bar(stat='identity') +
scale_fill_manual(values = c('black', 'gray')) +
facet_grid(.~supCond) +
scale_x_continuous('VOT (ms)', breaks=seq(-20, 80, by=20))
|
Measuring learning
Learning was assessed by fitting a mixed effects logistic regression model, extracting the category boundary (VOT where the predicted response was 50% /b/), and comparing this fitted category boundary to the boundary predicted by the input distribution (maximally ambiguous stimulus based on the /b/ and /p/ distributions).
Because the model includes slopes for trial, we had to pick where to evaluate the category boundaries. We picked the point 5/6ths of the way through the experiment, because this was near the end but not so close there's additional uncertainty from edge effects.
Results
Labeled trials across all experiments
Accuracy on labeled trials (where there was a correct response) was very good:
| |

| se <- function(x) {sd(x)/length(x)}
# responses and accuracy on labeled trials across all experiments:
labeled_summary <- dat %>%
filter(labeled == 'labeled') %>%
add_experiment %>%
mutate(labelCat = respCategory) %>%
group_by(labelCat, subject) %>%
summarise(acc = mean(labelCat == respCat),
respP = mean(respP)) %>%
summarise_each(funs(mean, se))
ggplot(labeled_summary, aes(x=labelCat, y=respP_mean)) +
geom_bar(stat='identity') +
scale_x_discrete('Labeled as') +
scale_y_continuous('Proportion /p/ responses', breaks=c(0, 0.5, 1))
|
98% accurate across all experiments.
| |
Experiment 1 | dat_ex1 <- dat %>%
add_experiment %>%
filter(experiment == 'Experiment 1')
|
- Distributions:
- Good learning (matched predicted category boundaries)
- No effect of labels
| |

| expt1_respP <- respP_by_thirds_unlab %>%
filter(experiment == 'Experiment 1')
predict_and_plot(filter(dat_pred, experiment == 'Experiment 1'),
dat_fit,
show_se=TRUE) %>%
format_results_plot +
geom_point(data = expt1_respP, aes(y=respP)) +
geom_line(data = expt1_respP, aes(y=respP))
|

| cat_bounds %>%
filter(experiment == 'Experiment 1') %>%
plot_category_bounds() +
scale_y_continuous('/b/-/p/ boundary (ms VOT)',
breaks = seq(10, 60, by=5),
limits = c(18, 32))
|
Experiments 2+3
- Too easy? Use bigger shifts
- Learning there, but not as good.
- Still no effect of labels
| |

| expt23_respP <- respP_by_thirds_unlab %>%
filter(experiment %in% c('Experiment 2', 'Experiment 3'))
predict_and_plot(filter(dat_pred, experiment %in% c('Experiment 2', 'Experiment 3')),
dat_fit,
show_se=TRUE) %>%
format_results_plot +
geom_point(data = expt23_respP, aes(y=respP)) +
geom_line(data = expt23_respP, aes(y=respP))
|

| cat_bounds %>%
filter(experiment %in% c('Experiment 2', 'Experiment 3')) %>%
plot_category_bounds()
|
Experiment 4
- Stimulus-specific learning? Mix up labeled and unlabeled trials. Compare with unsupervised conditions from Experiments 1-3.
- All shifts: +0, +10, +20, +30ms
- Nothing changes.
| |

| expt4_respP <- respP_by_thirds_unlab %>%
filter(experiment == 'Experiment 4' | supCond == 'unsupervised')
predict_and_plot(filter(dat_pred, supCond %in% c('mixed', 'unsupervised')),
dat_fit,
show_se=TRUE) %>%
format_results_plot +
geom_point(data = expt4_respP, aes(y=respP)) +
geom_line(data = expt4_respP, aes(y=respP))
|

| cat_bounds %>%
filter(experiment == 'Experiment 4' | supCond == 'unsupervised') %>%
plot_category_bounds(dodge_w=1)
|
Conclusion
- Listeners don't use labels to speed up or improve adaptation.
- More like acquisition: rely on distributions.
- But other sources of information do matter: less adaptation to weirder distributions (+20 and +30 ms shifts)
Model output
| |
For the curious and/or masochistic. Trial was centered and scaled to range \(-0.5–0.5\), VOT was centered around the predicted category boundary for each subject and scaled to increments of continuum steps (+1 is +10ms). Shift condition was treated as a factor and helmert coded, while supervision was sum-coded with unsupervised as the base level. Subject random effects intercepts and slopes for trial, VOT, and their interaction were included. | library(stargazer)
var_name_subs <- list(
c(':', ' : '),
c('vot_rel.s', 'VOT'),
c('bvotCond', 'Shift'),
c('supCond', 'unsup-vs-'),
c('trial.s', 'Trial'))
stargazer(dat_fit, float=FALSE, single.row=TRUE,
covariate.labels = str_replace_multi(names(fixef(dat_fit)),
var_name_subs, TRUE),
digits = 2, star.cutoffs = c(0.05, 0.01, 0.001),
column.labels=c('Experiment 1', 'Experiment 2'), align=TRUE,
intercept.bottom=FALSE, model.numbers=FALSE,
dep.var.labels.include=FALSE, dep.var.caption='',
keep.stat = c('n'), type='html')
|
| | Experiment 1 |
(Intercept) | 0.78*** (0.05) |
Trial | 0.07 (0.10) |
VOT | 1.62*** (0.04) |
unsup-vs-mixed | 0.04 (0.07) |
unsup-vs-supervised | -0.11 (0.07) |
Shift10 | 0.34*** (0.07) |
Shift20 | 0.42*** (0.04) |
Shift30 | 0.35*** (0.03) |
Trial : VOT | 0.75*** (0.07) |
Trial : unsup-vs-mixed | -0.10 (0.13) |
Trial : unsup-vs-supervised | 0.06 (0.13) |
VOT : unsup-vs-mixed | -0.05 (0.05) |
VOT : unsup-vs-supervised | -0.08 (0.05) |
Trial : Shift10 | -0.07 (0.13) |
Trial : Shift20 | 0.02 (0.08) |
Trial : Shift30 | -0.14* (0.05) |
VOT : Shift10 | 0.01 (0.05) |
VOT : Shift20 | -0.07* (0.03) |
VOT : Shift30 | -0.09*** (0.02) |
unsup-vs-mixed : Shift10 | -0.06 (0.10) |
unsup-vs-supervised : Shift10 | -0.02 (0.10) |
unsup-vs-mixed : Shift20 | -0.05 (0.06) |
unsup-vs-supervised : Shift20 | -0.03 (0.06) |
unsup-vs-mixed : Shift30 | -0.02 (0.04) |
unsup-vs-supervised : Shift30 | -0.04 (0.04) |
Trial : VOT : unsup-vs-mixed | 0.07 (0.09) |
Trial : VOT : unsup-vs-supervised | -0.09 (0.09) |
Trial : VOT : Shift10 | 0.10 (0.09) |
Trial : VOT : Shift20 | -0.04 (0.05) |
Trial : VOT : Shift30 | -0.01 (0.04) |
Trial : unsup-vs-mixed : Shift10 | -0.15 (0.19) |
Trial : unsup-vs-supervised : Shift10 | 0.14 (0.18) |
Trial : unsup-vs-mixed : Shift20 | -0.08 (0.11) |
Trial : unsup-vs-supervised : Shift20 | 0.18 (0.11) |
Trial : unsup-vs-mixed : Shift30 | -0.20** (0.08) |
Trial : unsup-vs-supervised : Shift30 | 0.15* (0.08) |
VOT : unsup-vs-mixed : Shift10 | -0.05 (0.07) |
VOT : unsup-vs-supervised : Shift10 | 0.02 (0.07) |
VOT : unsup-vs-mixed : Shift20 | 0.02 (0.04) |
VOT : unsup-vs-supervised : Shift20 | -0.07 (0.04) |
VOT : unsup-vs-mixed : Shift30 | 0.02 (0.03) |
VOT : unsup-vs-supervised : Shift30 | -0.03 (0.03) |
Trial : VOT : unsup-vs-mixed : Shift10 | -0.16 (0.13) |
Trial : VOT : unsup-vs-supervised : Shift10 | 0.04 (0.13) |
Trial : VOT : unsup-vs-mixed : Shift20 | -0.03 (0.07) |
Trial : VOT : unsup-vs-supervised : Shift20 | 0.03 (0.08) |
Trial : VOT : unsup-vs-mixed : Shift30 | -0.05 (0.05) |
Trial : VOT : unsup-vs-supervised : Shift30 | 0.02 (0.05) |
| Observations | 50,724 |
| Note: | *p<0.05; **p<0.01; ***p<0.001 |
| |