Applied Speech and Audio Processing: With matlab examples
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
Advanced topics
Figure 7.6 Block diagram of a generic speech synthesis system, showing input text being analysed phonetically to output concatenated strings of phonemes (top) and linguistically through a natural language processor (bottom left) to firstly adjust for any context-sensitive pronunciation changes, and secondly to add stress, intonation, pitch and volume changes to the phoneme string to account for sentence structure, punctuation and any words that may need to be emphasised. such as interest, distaste, happiness, and so on. Modulation could be frequency domain (pitch), time domain (rate of speaking, pauses between words, and so on) or in more complex ways that change the perceived sounds. Anyone who, like the author, has endured listening to overlong monotonous mono- logues in academic conferences, should value the presence of such intonation changes and modulations. As a practical demonstration, read the sentence ‘Friends, Romans, countrymen, lend me your ears’ aloud. Next re-read the sentence with uniform spacing between words, and in a monologue. Most would agree that this changes the impact of the sentence somewhat. Having demonstrated the importance of stress, intonation, pitch and pace in speech, we now need to acknowledge that these aspects are subjective elements added by a speaker, and which are not represented in the basic text. Thus even a TTS system that can reproduce single words so good they are indistinguishable from a human speaker’s, would fail to produce a natural output when the words are joined together, unless the issues mentioned are solved. For natural sounding speech synthesis, we can say that in general, as many of these elements should be incorporated as is possible. Two methods exist for incorporating these elements, firstly the use of a transcription system that includes the required information (to replace bare text input), and secondly a method of extracting such information from the text automatically. The first system is not truly TTS since the input is not just text, but text plus linguistic markers. Stress markers can be found as accents within the international phonetic alpha- bet, and would normally be added by an expert transcriber (although some experimental |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling