Lecture 4
Sound Systems of Language Phonetics - The sounds (phones) of the world’s languages, the phonemes they map to, and how they are produced
Phonology Technologies: - Automatic Speech Recognition (ASR) systems take sounds as input and output word hypotheses
- Text-to-Speech (TTS) systems take text as input and produce speech
Letters and Sounds same spelling = different sounds - o comb, tomb, bomb oo blood, food, good
- c court, center, cheese s reason, surreal, shy
same sound = different spellings - [i] sea, see, scene, receive, thief [s] cereal, same, miss
- [u] true, few, choose, lieu, do [ay] prime, buy, rhyme, lie
combination of letters = single sound single letter = combination of sounds - x exit, Texas u use, music
‘silent’ letters
Articulators
Articulators in action
Vocal fold vibration
Places of articulation
Articulatory parameters for English consonants (in ARPAbet)
American English vowel space
Syllables Syllabification important for - pronunciation: deny/denim
- speaking rate calculation: syllables per second
- word recognition in ASR
(onset) + nucleus + (coda): Lexical stress: primary, secondary, terciary
Phonological Rules Not all instances of a given phone [x] sound/look alike Phonological rules map phonemes in context to allophones, e.g. - simple rules: /{t,d}/ --> [V’ _ V
- FSA’s, FST’s
- declarative constraints: t: V’ _ V
Allophones of /t/ What we would consider a single ‘sound’ can be pronounced differently depending on the phonetic context. For example, the phoneme /t/:
Application: Word Pronunciation for TTS Pronouncing dictionaries (the: [‘dhax],[‘dhiy]) Problems: - Homographs (bass/bass, wind/wind, desert/desert)
- Abbreviation (dr., st.)
- Numbers (2125551212)
- Acronyms (NAACL, IDIAP)
- Morphological variation (unrelentingly)
- Proper names and unknown words
rules + dictionaries/dictionaries + rules
Hybrid model: - FSTs model individual word pronunciation in lexicon (e.g. reg-noun-stem entry c:k a:ae t:t)
- FSAs model morphology (e.g. reg-noun-stem + s)
- FSTs for pronunciation rules (e.g. s--> z)
- special rules to model name and acronym pronunciation
- default letter2sound rules for other words
Inventive (and sometimes useful) Approaches for Pronouncing Unknown Words Rhyming analogy: varoom/room, todo/dodo Linguistic origin: Infiniti, vingt, Perez Abbreviation expansion:
Summary Phones realize phonemes in different contexts - Different places and manners of articulation result in acoustic differences that can be detected by ASR systems as well as people
Versatile FSTs can model phonological as well as morphological and spelling systems Many creative approaches toward pronunciation modeling for TTS Next time: Read Ch 6 (Guest Speaker: Sameer Maskey)
Do'stlaringiz bilan baham: |