Underspecified feature models for pronunciation variation in asr eric Fosler-Lussier

Underspecified feature models for pronunciation variation in ASR

Fill in the blanks

Filling in the blanks: missing data

Decode this!

Decode this!

Decode this!

Decode this!

What do these tasks have in common?

What do these tasks have in common?

Outline

“The Case Against The Phoneme” Homage to Ostendorf (ASRU 99)

“The Case Against The Phoneme” Homage to Ostendorf (ASRU 99)

“The Case Against The Phoneme” Homage to Ostendorf (ASRU 99)

“The Case Against The Phoneme” Homage to Ostendorf (ASRU 99)

“The Case Against The Phoneme” Homage to Ostendorf (ASRU 99)

Using phonological features

Issues with phonological features

Phonetic transcription

ASR & Phonetic Transcription

Can you trust transcription?

Variation in single-phone changes

Recent approaches to feature modeling in ASR

Feature detection methods

Binary vs. n-ary features

Hierarchical representations

Combining features into higher-level structures

Combining features into higher-level structures

Combining features into higher-level structures

Conditional Random Fields

Underspecification

Underspecification

Vision for the Future

Conclusions

Fin

An example feature grid

Do'stlaringiz bilan baham:

Underspecified feature models for pronunciation variation in asr eric Fosler-Lussier

Underspecified feature models for pronunciation variation in ASR

Eric Fosler-Lussier

The Ohio State University

Speech & Language Technologies Lab

ITRW - Speech Recognition & Intrinsic Variation

20 May 2006

Fill in the blanks

3, 6, __, 12, 15, __, 21, 24

A B C __ E F __ H

You’re going to Toulouse? Drink a bottle of _____ for me!

What’s the red object?

Filling in the blanks: missing data

Missing data approaches have been used to integrate over noisy acoustics

Decode this!

(brackets indicate options)

s iy n y {ah,ax,axr,er}

{l,r} {eh,ih,iy} s er ch

{ah,ax} s ow {s,sh,z,zh} {eh,ih,iy} {eh,ey} {t,d}

Decode this!

(brackets indicate options)

s iy n y {ah,ax,axr,er} senior

{l,r} {eh,ih,iy} s er ch research

{ah,ax} s ow {s,sh,z,zh} {eh,ih,iy} {eh,ey} {t,d} associate

Decode this!

(brackets indicate options)

s iy n y {ah,ax,axr,er} senior

{l,r} {eh,ih,iy} s er ch research

{ah,ax} s ow {s,sh,z,zh} {eh,ih,iy} {eh,ey} {t,d} associate

dictionary pronunciation

Decode this!

(brackets indicate options)

s iy n y {ah,ax,axr,er} senior

{l,r} {eh,ih,iy} s er ch research

{ah,ax} s ow {s,sh,z,zh} {eh,ih,iy} {eh,ey} {t,d} associate

dictionary pronunciation

as marked by transcribers (Buckeye Corpus of Speech)

What do these tasks have in common?

Recovering from erroneous information?

What do these tasks have in common?

Recovering from erroneous information?

Recovering from incomplete information!

Outline

Problems with phonetic representations of variation

Re-examining the role of phonetic transcription

Phonological feature approaches to ASR

A challenge for the future

“The Case Against The Phoneme” Homage to Ostendorf (ASRU 99)

Four major indications that phonetic modeling of variation is not appropriate:

“The Case Against The Phoneme” Homage to Ostendorf (ASRU 99)

Four major indications that phonetic modeling of variation is not appropriate:

“The Case Against The Phoneme” Homage to Ostendorf (ASRU 99)

Four major indications that phonetic modeling of variation is not appropriate:

“The Case Against The Phoneme” Homage to Ostendorf (ASRU 99)

Four major indications that phonetic modeling of variation is not appropriate:

“The Case Against The Phoneme” Homage to Ostendorf (ASRU 99)

Four major indications that phonetic modeling of variation is not appropriate:

Using phonological features

Finer granularity

Features may provide basis for cross-lingual recognition

Issues with phonological features

Interlingua: “high vowels in English are not the same as high vowels in Japanese”

Concept of “independent directions” false

Dealing with feature spreading

Even more difficulty in transcription

Articulatory vs. acoustic features

Phonetic transcription

There have been a number of efforts to transcribe speech phonetically

ASR researchers have found it difficult to utilize phonetic transcriptions directly

ASR & Phonetic Transcription

Saraclar & Khudanpur (04) examined the means of acoustic models where canonical phone /x/ was transcribed as [y] over all pairs x:y

Another view: data from Buckeye Corpus

Can you trust transcription?

Perceptual marking ≠ acoustic measurement

What are the transcribers are trying to tell us?

Phonological features may help us represent transcription differences.

Variation in single-phone changes

Compared canonical vs. transcribed consonants with single-phone substitutions in Switchboard, Buckeye

Recent approaches to feature modeling in ASR

Since 90’s there has been increased interest in phonological feature modeling

3, 6, , 12, 15, , 21, 24

A B C E F H