Applied Speech and Audio Processing: With matlab examples
He said he did not eat this : indicating that someone else said so He said
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
- Bu sahifa navigatsiya:
- Advanced topics He said he did not
He said he did not eat this : indicating that someone else said so
He said he did not eat this : indicating that you probably do not believe him He said he did not eat this : indicating that someone else did He said he did not eat this : indicating that he is, or will be eating this 180 Advanced topics He said he did not eat this : indicating an emphatic negative He said he did not eat this : indicating that he did something else with it He said he did not eat this : indicating that he did eat something, but not this and of course several of the above stresses could be used in combination. Context was discussed in Section 3.3.4, and although it cannot be relied upon in all cases, it can be used to strengthen recognition accuracy upon occasion. 7.6 Speech synthesis In the wider sense, speech synthesis is the process of creating artificial speech, whether by mechanical, electrical or other means. There is a long history of engineers who attempted this task, including the famous Austrian Wolfgang von Kempelen who pub- lished a mechanical speech synthesiser in 1791 (although it should be noted that he also invented ‘The Turk’, a mechanical chess playing machine which astounded the public and scientists across Europe alike for many years before it was revealed that a person, curled up inside, operated the mechanism). However, the more sober Charles Wheat- stone built a synthesiser based on this work of von Kempelen in 1857, proving that this device at least was not a hoax. These early machines used mechanical arrangements of tubes and levers to recreate a model of the human vocal tract, generally fed through air bellows. Different combinations of lever settings could cause the systems to create vowels and consonants on demand. All required an intelligent human to learn the operation of the system in order to decide which sounds to sequence together to form speech. In fact, much the same methodology of stringing together different phonemes is still in use today, having survived the transition to electrical systems in the 1930s through electronic systems in the 1960s and into computers. Of course, speech synthesis researchers would argue that both quality and usability have improved significantly over the past 300 years. Today, there are three broad classifications of speech synthesisers, namely text-to- speech systems (TTS), phonetic or linguistic transcription systems, and simple playback systems. We will briefly overview each of these in turn. 7.6.1 Voice playback systems Simple playback systems, often used for telephone voicemail menus, record words or sentences for later playback. An example would be systems which record spoken digits and then replay them in different sequence to generate larger numbers. Although the quality of the speech itself can be high, these systems do not sound particularly natural because the intonation of various words does not always match listener expectations. In addition these systems are not particularly flexible: they cannot create new words, and require non-volatile storage of every word in their vocabulary. Advances in speech quantisation, in particular the CELP analysis-by-synthesis systems of Chapter 5, allow stored audio to be compressed in size, such that storage requirements, even for quite a large vocabulary, are not excessive. Otherwise, basic LPC parameters |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling