Automatic Speech Recognition fr 7 Allgemeine Linguistik Institut für Phonetik, UdS (ipus)


Download 466 b.
Sana04.11.2017
Hajmi466 b.
#19360


Automatic Speech Recognition

  • FR 4.7 Allgemeine Linguistik Institut für Phonetik, UdS (IPUS)


Overview



Speech Recognition: Applications



The aim of an ASR-System

  • Recognition of an utterance …… on the basis of:

  • Variability in the signal affects both the signal modelling and the way the Lexicon is structured.



Variation in the realisation of words

  • Phonological and phonetic processes can result in different realisations of the same word:



Variation in the realisation of words

  • Sound deletion

  • A sound contained in the „canonical“ form (lexicon form) is not realised.



Variation in the realisation of words

  • Epenthesis

  • A sound NOT contained in the „canonical“ form (lexicon form) is inserted.



Variation in the realisation of words



Variation in the realisation of words

  • Assimilation

  • The (phonological) identity of a sound changes under the influence of the context (segmentally and prosodically conditioned). E.g.,



Variation in the realisation of words

  • The variation arising from phonological processes (deletion, epenthesis and assimilation) can be captured in the lexicon as pronunciation variants.



“Top-down” helps “bottom-up”

  • The lexicon and the language model (which captures the legal word sequences (together they constitute thetop-downprocessing) help to resolve the ambiguities which arise during the signal processing stage (“bottom-up” processing), since only those sound sequences which correspond to a possible sequence in the lexicon entries, can be recognised by an ASR system.



Variation in the realisation of words

  • Ambiguities in the Signal arise also from the phonetic variation which results from the coarticulation between (neighbouring) sounds,

  • so:



Variation in the realisation of words

  • a sound  a single acoustic pattern

  • Example: /h/ can look very different in different contexts. /h/ could be described as „a voiceless“ realisation of the context vowels (in particular of the following vowel). See:

  • (Spectrograms “ihi”, “aha”, “uhu”: different realisations of /h/)



Variation in the realisation of words



Variation in the realisation of words

  • Overlap of articulatory gestures

  • Example: The articulatory gesture for the vowel // overlaps with the gesture for the neighbouring fricatives.

  • (Spectrogram “Dezimalsystem”: no clear separation of the sounds)



Variation in the realisation of words



Variation in the realisation of words

  • Articulatory Transitions

  • Example: At the boundary of the vowel, the realisation depends strongly on the articulation of the neighbouring sound.

  • (Spectrogram “aba”, “ada”, “aga”: Variation in the vowel //)



Variation in the realisation of words



Variation in the realisation of words



Markov-Modelling



MMs: A simple example

  • You start in state S (no emission) and go from there with a probability of p = 1 to state 1.

  • There you take a black ball from the container.



MMs: A simple example

  • Then you either go on to the 2nd state (p = 0.4) and take a red ball from the container, or you stay by the 1st container and take another black ball.

  • And so on, until you land in state E and have collected a number of coloured balls.



Hidden Markov Modelling



Hidden Markov Modelling



HMMs: A simple example

  • You start in state S (no emission) and go from there to state 1 with a probability of p = 1.

  • You take a ball from the container which, this time, can be black, red or yellow.



HMMs: A simple example

  • Then you either go on to the 2nd state (p = 0.4) and take a ball from the container, or you stay by the 1st container and take another ball.

  • And so on, until you land in state E and have collected a number of coloured balls.



HMMs: Hidden states



HMMs: Speech recognition



HMMs: Transitions



HMMs: Emissions



HMMs: more complexe models



HMMs: Paucity of data?



HMMs: Speech recognition



HMMs: Lexicon & Language model



HMMs: Lexicon



HMMs: Lexicon



HMMs: Language model



HMMs: Language model



Literature:

  • Van Alphen, P. und D. van Bergem (1989). „Markov models and their application in speech recognition,“ Proceedings Institute of Phonetic Sciences, University of Amsterdam 13, 1-26.

  • Holmes, J. (1988). Speech Synthesis and Recognition (Kap. 8). Wokingham (Berks.): Van Nostrand Reinhold, 129-152.

  • Holmes, J. (1991). Spracherkennung und Sprachsynthese (Kap. 8). München: Oldenburg.



Literature:

  • Cox, S. (1988). „Hidden Markov models for automatic speech recognition: theory and application,“ Br. Telecom techn. Journal 6(2), 105-115.

  • Lee, K.-F. (1989). „Hidden Markov modelling: past, present, future,“ Proc. Eurospeech 1989, vol. 1, 148-155.



Download 466 b.

Do'stlaringiz bilan baham:




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling