Automatic Speech Recognition fr 7 Allgemeine Linguistik Institut für Phonetik, UdS (ipus)

Automatic Speech Recognition

Overview

Speech Recognition: Applications

The aim of an ASR-System

Variation in the realisation of words

Variation in the realisation of words

Variation in the realisation of words

Variation in the realisation of words

Variation in the realisation of words

“Top-down” helps “bottom-up”

Variation in the realisation of words

Variation in the realisation of words

Variation in the realisation of words

Variation in the realisation of words

Variation in the realisation of words

Markov-Modelling

MMs: A simple example

MMs: A simple example

Hidden Markov Modelling

HMMs: A simple example

HMMs: A simple example

HMMs: Hidden states

HMMs: Speech recognition

HMMs: Transitions

HMMs: Emissions

HMMs: more complexe models

HMMs: Paucity of data?

HMMs: Speech recognition

HMMs: Lexicon & Language model

HMMs: Lexicon

HMMs: Language model

Literature:

Literature:

Do'stlaringiz bilan baham:

Automatic Speech Recognition fr 7 Allgemeine Linguistik Institut für Phonetik, UdS (ipus)

Automatic Speech Recognition

FR 4.7 Allgemeine Linguistik Institut für Phonetik, UdS (IPUS)

Overview

Variation in the realisation of words

Modelling the acoustic Signal

Hidden-Markov-Modelling

Speech Recognition: Applications

The aim of an ASR-System

Recognition of an utterance …… on the basis of:

Variability in the signal affects both the signal modelling and the way the Lexicon is structured.

Variation in the realisation of words

Phonological and phonetic processes can result in different realisations of the same word:

Variation in the realisation of words

Sound deletion

A sound contained in the „canonical“ form (lexicon form) is not realised.

Variation in the realisation of words

Epenthesis

A sound NOT contained in the „canonical“ form (lexicon form) is inserted.

Variation in the realisation of words

Variation in the realisation of words

Assimilation

The (phonological) identity of a sound changes under the influence of the context (segmentally and prosodically conditioned). E.g.,

Variation in the realisation of words

The variation arising from phonological processes (deletion, epenthesis and assimilation) can be captured in the lexicon as pronunciation variants.

“Top-down” helps “bottom-up”

Variation in the realisation of words

Ambiguities in the Signal arise also from the phonetic variation which results from the coarticulation between (neighbouring) sounds,

so:

Variation in the realisation of words

a sound  a single acoustic pattern

Example: /h/ can look very different in different contexts. /h/ could be described as „a voiceless“ realisation of the context vowels (in particular of the following vowel). See:

(Spectrograms “ihi”, “aha”, “uhu”: different realisations of /h/)

Variation in the realisation of words

Variation in the realisation of words

Overlap of articulatory gestures

Example: The articulatory gesture for the vowel // overlaps with the gesture for the neighbouring fricatives.

(Spectrogram “Dezimalsystem”: no clear separation of the sounds)

Variation in the realisation of words

Variation in the realisation of words

Articulatory Transitions

Example: At the boundary of the vowel, the realisation depends strongly on the articulation of the neighbouring sound.

(Spectrogram “aba”, “ada”, “aga”: Variation in the vowel //)

Variation in the realisation of words

Variation in the realisation of words

Markov-Modelling

MMs: A simple example

You start in state S (no emission) and go from there with a probability of p = 1 to state 1.

There you take a black ball from the container.

MMs: A simple example

Then you either go on to the 2nd state (p = 0.4) and take a red ball from the container, or you stay by the 1st container and take another black ball.

And so on, until you land in state E and have collected a number of coloured balls.

Hidden Markov Modelling

Hidden Markov Modelling

HMMs: A simple example

You start in state S (no emission) and go from there to state 1 with a probability of p = 1.

You take a ball from the container which, this time, can be black, red or yellow.

HMMs: A simple example

Then you either go on to the 2nd state (p = 0.4) and take a ball from the container, or you stay by the 1st container and take another ball.

And so on, until you land in state E and have collected a number of coloured balls.

HMMs: Hidden states

HMMs: Speech recognition

HMMs: Transitions

HMMs: Emissions

HMMs: more complexe models

HMMs: Paucity of data?

HMMs: Speech recognition

HMMs: Lexicon & Language model

HMMs: Lexicon

HMMs: Lexicon

HMMs: Language model

HMMs: Language model

Literature:

Van Alphen, P. und D. van Bergem (1989). „Markov models and their application in speech recognition,“ Proceedings Institute of Phonetic Sciences, University of Amsterdam 13, 1-26.

Holmes, J. (1988). Speech Synthesis and Recognition (Kap. 8). Wokingham (Berks.): Van Nostrand Reinhold, 129-152.

Holmes, J. (1991). Spracherkennung und Sprachsynthese (Kap. 8). München: Oldenburg.

Literature:

Cox, S. (1988). „Hidden Markov models for automatic speech recognition: theory and application,“ Br. Telecom techn. Journal 6(2), 105-115.

Lee, K.-F. (1989). „Hidden Markov modelling: past, present, future,“ Proc. Eurospeech 1989, vol. 1, 148-155.