Lecture 4 cs4705 Sound Systems and Text-to-Speech


Download 447 b.
Sana04.11.2017
Hajmi447 b.
#19347


Lecture 4


Sound Systems of Language

  • Phonetics

    • The sounds (phones) of the world’s languages, the phonemes they map to, and how they are produced
  • Phonology

  • Technologies:

    • Automatic Speech Recognition (ASR) systems take sounds as input and output word hypotheses
    • Text-to-Speech (TTS) systems take text as input and produce speech


Letters and Sounds

  • same spelling = different sounds

    • o comb, tomb, bomb oo blood, food, good
    • c court, center, cheese s reason, surreal, shy
  • same sound = different spellings

    • [i] sea, see, scene, receive, thief [s] cereal, same, miss
    • [u] true, few, choose, lieu, do [ay] prime, buy, rhyme, lie
  • combination of letters = single sound

  • single letter = combination of sounds

    • x exit, Texas u use, music
  • ‘silent’ letters



Articulators



Articulators in action



Vocal fold vibration



Places of articulation



Articulatory parameters for English consonants (in ARPAbet)



American English vowel space



Acoustic landmarks



Syllables

  • Syllabification important for

    • pronunciation: deny/denim
    • speaking rate calculation: syllables per second
    • word recognition in ASR
  • (onset) + nucleus + (coda):

    • c a t
    • a
    • a t
    • t o
  • Lexical stress: primary, secondary, terciary

    • telephone


Phonological Rules

  • Not all instances of a given phone [x] sound/look alike

  • Phoneme /x/ may have many allophones

  • Phonological rules map phonemes in context to allophones, e.g.

    • simple rules: /{t,d}/ --> [V’ _ V
    • FSA’s, FST’s
    • declarative constraints: t: V’ _ V


Allophones of /t/

  • What we would consider a single ‘sound’ can be pronounced differently depending on the phonetic context. For example, the phoneme /t/:



Application: Word Pronunciation for TTS

  • Pronouncing dictionaries (the: [‘dhax],[‘dhiy])

  • Problems:

    • Homographs (bass/bass, wind/wind, desert/desert)
    • Abbreviation (dr., st.)
    • Numbers (2125551212)
    • Acronyms (NAACL, IDIAP)
    • Morphological variation (unrelentingly)
    • Proper names and unknown words
  • rules + dictionaries/dictionaries + rules



  • Hybrid model:

    • FSTs model individual word pronunciation in lexicon (e.g. reg-noun-stem entry c:k a:ae t:t)
    • FSAs model morphology (e.g. reg-noun-stem + s)
    • FSTs for pronunciation rules (e.g. s--> z)
    • special rules to model name and acronym pronunciation
    • default letter2sound rules for other words


Inventive (and sometimes useful) Approaches for Pronouncing Unknown Words

  • Rhyming analogy: varoom/room, todo/dodo

  • Linguistic origin: Infiniti, vingt, Perez

  • Abbreviation expansion:



Summary

  • Phones realize phonemes in different contexts

    • Different places and manners of articulation result in acoustic differences that can be detected by ASR systems as well as people
  • Versatile FSTs can model phonological as well as morphological and spelling systems

  • Many creative approaches toward pronunciation modeling for TTS

  • Next time: Read Ch 6 (Guest Speaker: Sameer Maskey)



Download 447 b.

Do'stlaringiz bilan baham:




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling