Phonetic features in asr intensive course Dipartimento di Elettrotecnica ed Elettronica Politecnica di Bari 22 – 26 March 1999


Download 462 b.
Sana04.11.2017
Hajmi462 b.
#19361


Phonetic features in ASR

  • Intensive course Dipartimento di Elettrotecnica ed Elettronica Politecnica di Bari 22 – 26 March 1999

  • Jacques Koreman Institute of Phonetics University of the Saarland P.O. Box 15 11 50 D - 66041 Saarbrücken E-mail: Germany jkoreman@coli.uni-sb.de


Organisation of the course

  • Tuesday – Friday: - First half of each session: theory - Second half of each session: practice

  • Interruptions invited!!!



Overview of the course

  • 1. Variability in the signal

  • 2. Phonetic features in ASR

  • 3. Deriving phonetic features from the acoustic signal by a Kohonen network

  • 4. ICSLP’98: “Exploiting transitions and focussing on linguistic properties for ASR”

  • 5. ICSLP’98: “Do phonetic features help to improve consonant identification in ASR?”



The goal of ASR systems

  • Input: spectral description of microphone signal, typically - energy in band-pass filters - LPC coefficients - cepstral coefficients

  • Output: linguistic units, usually phones or phonemes (on the basis of which words can be recognised)



Variability in the signal (1)

  • Main problem in ASR: variability in the input signal Example: /k/ has very different realisations in different contexts. Its place of articulation varies from velar before back vowels to pre-velar before front vowels (own articulation of “keep”,“cool”)



Variability in the signal (2)

  • Main problem in ASR: variability in the input signal Example: /g/ in canonical form is sometimes realised as a fricative or approximant , e.g. intervocalically (OE. regen > E. rain). In Danish, this happens to all intervocalic voiced plosives; also, voiceless plosives become voiced.



Variability in the signal (3)

  • Main problem in ASR: variability in the input signal Example: /h/ has very different realisations in different contexts. It can be considered as a voiceless realisation of the surrounding vowels. (spectrograms “ihi”, “aha”, “uhu”)



Variability in the signal (3a)



Variability in the signal (4)

  • Main problem in ASR: variability in the input signal Example: deletion of segments due to articulat- ory overlap. Friction is superimposed on the vowel signal.

  • (spectrogram G.“System”)



Variability in the signal (4a)



Variability in the signal (5)

  • Main problem in ASR: variability in the input signal Example: the same vowel /a:/ is realised differ- ently dependent on its context.

  • (spectrogram “aba”, “ada”, “aga”)



Variability in the signal (5a)



Modelling variability

  • Hidden Markov models can represent the variable signal characteristics of phones



Lexicon and language model (1)

  • Linguistic knowledge about phone sequences (lexicon, language model) improves word recognition

  • Without linguistic knowledge, low phone accuracy



Lexicon and language model (2)

  • Using a lexicon and/or language model is not a top-down solution to all problems: sometimes pragmatic knowledge needed.

  • Example: 



Lexicon and language model (3)

  • Using a lexicon and/or language model is not a top-down solution to all problems: sometimes pragmatic knowledge needed.

  • Example: []



CONCLUSIONS

  • The acoustic parameters (e.g. MFCC) are very variable.

  • We must try to improve phone accuracy by extracting linguistic information.

  • Rationale: word recognition rates will increase if phone accuracy improves

  • BUT: not all our problems can be solved



Phonetic features in ASR

  • Assumption: phone accuracy can be improved by deriving phonetic features from the spectral representation of the speech signal

  • What are phonetic features?



A phonetic description of sounds

  • The articulatory organs



A phonetic description of sounds

  • The articulation of consonants



A phonetic description of sounds

  • The articulation of vowels



Phonetic features: IPA

  • IPA (International Phonetic Alphabet) chart - consonants and vowels - only phonemic distinctions (http://www.arts.gla.ac.uk/IPA/ipa.html)



The IPA chart (consonants)



The IPA chart (other consonants)



The IPA chart (non-pulm. cons.)



The IPA chart (vowels)



The IPA chart (diacritics)



IPA features (obstruents)



IPA features (sonorants)



IPA features (vowels)



Phonetic features

  • Phonetic features - different systems (JFH, SPE, art. feat.) - distinction between “natural classes” which undergo the same phonological processes



SPE features (obstruents)

  • c s n s l h c b r a c c v l s t

  • n y a o o i e a o n o n o a t e

  • s l s n w g n c u t r t i t r n

  • p0 1 -1 -1 -1 -1 0 0 0 -1 0 0 -1 -1 -1 -1 1

  • b0 1 -1 -1 -1 -1 0 0 0 -1 0 0 -1 1 -1 -1 -1

  • p 1 -1 -1 -1 -1 -1 0 -1 -1 1 -1 -1 -1 -1 -1 1

  • b 1 -1 -1 -1 -1 -1 0 -1 -1 1 -1 -1 1 -1 -1 -1

  • tden 1 -1 -1 -1 -1 -1 0 -1 -1 1 1 -1 -1 -1 -1 1

  • t 1 -1 -1 -1 -1 -1 0 -1 -1 1 1 -1 -1 -1 -1 1

  • d 1 -1 -1 -1 -1 -1 0 -1 -1 1 1 -1 1 -1 -1 -1

  • k 1 -1 -1 -1 -1 1 0 1 -1 -1 -1 -1 -1 -1 -1 1

  • g 1 -1 -1 -1 -1 1 0 1 -1 -1 -1 -1 1 -1 -1 -1

  • f 1 -1 -1 -1 -1 -1 0 -1 -1 1 -1 1 -1 -1 1 1

  • vfri 1 -1 -1 -1 -1 -1 0 -1 -1 1 -1 1 1 -1 1 -1

  • T 1 -1 -1 -1 -1 -1 0 -1 -1 1 1 1 -1 -1 -1 1

  • Dfri 1 -1 -1 -1 -1 -1 0 -1 -1 1 1 1 1 -1 -1 -1

  • s 1 -1 -1 -1 -1 -1 0 -1 -1 1 1 1 -1 -1 1 1

  • z 1 -1 -1 -1 -1 -1 0 -1 -1 1 1 1 1 -1 1 -1

  • S 1 -1 -1 -1 -1 1 0 -1 -1 -1 1 1 -1 -1 1 1

  • Z 1 -1 -1 -1 -1 1 0 -1 -1 -1 1 1 1 -1 1 -1

  • C 1 -1 -1 -1 -1 1 0 -1 -1 -1 -1 1 -1 -1 1 1

  • x 1 -1 -1 -1 -1 1 0 1 -1 -1 -1 1 -1 -1 1 1



SPE features (sonorants)

  • c s n s l h c b r a c c v l s t

  • n y a o o i e a o n o n o a t e

  • s l s n w g n c u t r t i t r n

  • m 1 -1 1 1 -1 -1 0 -1 -1 1 -1 -1 1 -1 -1 0

  • n 1 -1 1 1 -1 -1 0 -1 -1 1 1 -1 1 -1 -1 0

  • J 1 -1 1 1 -1 1 0 -1 -1 -1 -1 -1 1 -1 -1 0

  • N 1 -1 1 1 -1 1 0 1 -1 -1 -1 -1 1 -1 -1 0

  • l 1 -1 -1 1 -1 -1 0 -1 -1 1 1 1 1 1 -1 0

  • L 1 -1 -1 1 -1 1 0 -1 -1 -1 -1 1 1 1 -1 0

  • ralv 1 -1 -1 1 -1 -1 0 -1 -1 1 1 1 1 -1 -1 0

  • Ruvu 1 -1 -1 1 -1 -1 0 1 -1 -1 -1 1 1 -1 -1 0

  • rret 1 -1 -1 1 -1 -1 0 -1 -1 -1 1 1 1 -1 -1 0

  • j -1 -1 -1 1 -1 1 0 -1 -1 -1 -1 1 1 -1 -1 0

  • vapr -1 -1 -1 1 -1 -1 0 -1 -1 1 -1 1 1 -1 -1 0

  • w -1 -1 -1 1 -1 1 0 1 1 1 -1 1 1 -1 -1 0

  • h -1 -1 -1 1 1 -1 0 -1 -1 -1 -1 1 -1 -1 -1 0

  • XXX 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0



SPE features (vowels)

  • c s n s l h c b r a c c v l s t

  • n y a o o i e a o n o n o a t e

  • s l s n w g n c u t r t i t r n

  • i -1 1 -1 1 -1 1 -1 -1 -1 -1 -1 1 1 -1 -1 1

  • I -1 1 -1 1 -1 1 -1 -1 -1 -1 -1 1 1 -1 -1 -1

  • e -1 1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 1 -1 -1 1

  • E -1 1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 1 -1 -1 -1

  • { -1 1 -1 1 1 -1 -1 -1 -1 -1 -1 1 1 -1 -1 -1

  • a -1 1 -1 1 1 -1 -1 -1 -1 -1 -1 1 1 -1 -1 1

  • y -1 1 -1 1 -1 1 -1 -1 1 -1 -1 1 1 -1 -1 1

  • Y -1 1 -1 1 -1 1 -1 -1 1 -1 -1 1 1 -1 -1 -1

  • 2 -1 1 -1 1 -1 -1 -1 -1 1 -1 -1 1 1 -1 -1 1

  • 9 -1 1 -1 1 -1 -1 -1 -1 1 -1 -1 1 1 -1 -1 -1

  • A -1 1 -1 1 1 -1 -1 1 -1 -1 -1 1 1 -1 -1 -1

  • Q -1 1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 -1

  • V -1 1 -1 1 -1 -1 -1 1 -1 -1 -1 1 1 -1 -1 -1

  • O -1 1 -1 1 -1 -1 -1 1 1 -1 -1 1 1 -1 -1 -1

  • o -1 1 -1 1 -1 -1 -1 1 1 -1 -1 1 1 -1 -1 1

  • U -1 1 -1 1 -1 1 -1 1 1 -1 -1 1 1 -1 -1 -1

  • u -1 1 -1 1 -1 1 -1 1 1 -1 -1 1 1 -1 -1 1

  • Uschwa -1 1 -1 1 -1 -1 1 -1 1 -1 -1 1 1 -1 -1 -1

  • 3 -1 1 -1 1 -1 -1 1 -1 -1 -1 -1 1 1 -1 -1 1

  • @ -1 1 -1 1 -1 -1 1 -1 -1 -1 -1 1 1 -1 -1 -1

  • 6 -1 1 -1 1 1 -1 1 -1 -1 -1 -1 1 1 -1 -1 -1



CONCLUSION



Kohonen networks

  • Kohonen networks are unsupervised neural networks

  • Our Kohonen networks take vectors of acoustic parameters (MFCC_E_D) as input and output phonetic feature vectors

  • Network size: 50 x 50 neurons



Training the Kohonen network

  • 1. Self-organisation results in a phonotopic map

  • 2. Phone calibration attaches array of phones to each winning neuron

  • 3. Feature calibration replaces array of phones by array of phonetic feature vectors

  • 4. Averaging of phonetic feature vectors for each neuron



Mapping with the Kohonen network

  • Acoustic parameter vector belonging to one frame activates neuron

  • Weighted average of phonetic feature vector attached to winning neuron and K-nearest neurons is output



Advantages of Kohonen networks

  • Reduction of features dimensions possible

  • Mapping onto linguistically meaningful dimensions (phonetically less severe confusions)

  • Many-to-one mapping allows mapping of different allophones (acoustic variability) onto the same phonetic feature values

  • automatic and fast mapping



Disadvantages of Kohonen networks

  • They need to be trained on manually segmented and labelled material

  • BUT: cross-language training has been shown to be succesful



Hybrid ASR system



CONCLUSION

  • Acoustic-phonetic mapping extracts linguistically relevant information from the variable input signal.



ICSLP’98



INTRODUCTION



INTRODUCTION



DATA



DATA



DATA



EXPERIMENT 1: SYSTEM



EXPERIMENT 1: RESULTS



EXPERIMENT 1: CONCLUSIONS



EXPERIMENT 2: SYSTEM



EXPERIMENT 2: RESULTS



EXPERIMENT 2: CONCLUSIONS



EXPERIMENT 3: SYSTEM



EXPERIMENT 3: RESULTS



EXPERIMENT 3: CONCLUSIONS



INTERPRETATION (1)



INTERPRETATION (2)



REFERENCES (1)



REFERENCES (2)



SUMMARY



ICSLP’98



INTRODUCTION



DATA



DATA



DATA (1)



DATA (2)



SYSTEM ARCHITECTURE



CONFUSIONS BASELINE



CONFUSIONS MAPPING



ACIS =



BASELINE SYSTEM



MAPPING SYSTEM



AFFRICATES (1)



AFFRICATES (2)



APMS =



APMS =



CONSONANT CONFUSIONS



CONCLUSIONS



CONCLUSIONS



REFERENCES (1)



REFERENCES (2)



SUMMARY



THE END



Download 462 b.

Do'stlaringiz bilan baham:




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling