Applied Speech and Audio Processing: With matlab examples
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
7.6. Speech synthesis
183 systems exist which attempt to add this information automatically, and perhaps future speech recognition systems would be capable of recognising this information?). For a more complete system, pitch variations, prosodic information, volume modulation and so on would also need to be added. For the foreseeable future, this task probably re- quires human intervention, and thus systems that require linguistic marker information are better suited to small constrained vocabulary systems. The second method of improving naturalness requires a machine to ‘read’ a sentence without a-priori knowledge of stress and intonation patterns, to extract such information from the text, and to add this into its reading. This is much the same task that a human speaker faces when given a paragraph to read out loud: to scan the text, and decide which words to stress, where to speed up, increase volume, pause and so on. This task is no longer considered to be within the domain of speech synthesis since it is one of interpretation, it is part of natural language processing (NLP) research. Figure 7.6 provides a block diagram of a generic concatenative speech synthesiser of the second type mentioned here. A database of stored phonetic sounds is strung together by the system to match the phonemes of the input text. For some words, the phonetic sequence is generated through phonetisation rules, but can be overridden by a dictionary of exceptions to the rule. Occasionally, context would dictate different pronunciation of a word, and this would be detected through the use of an NLP system. NLP is also used to identify places in the recreated phoneme string to add in stress patterns, probably including changes in prosody, pitch variation, volume variation, and so on. It should of course be noted that this is just one possible method of performing synthesis: there are countless alternatives represented in research literature [21]. 7.6.4 Practical speech synthesis The Festival speech synthesis system by the University of Edinburgh Centre for Speech Technology Research is probably the most common synthesis system in use for research today. It is free software, distributed open-source style, which researchers and developers can download, test and modify as required [22]. It offers a highly configurable ‘general framework’ for speech synthesis research, as well as a fully working system that can synthesise speech in English, Welsh, American English and Spanish. Festival is capable of uttering whole sentences, which it can do by formulating a grammar or syntax structure. In fact, at the word level, a sequence of words in time is related, inside Festival, to other attributes of those words. Words themselves comprise phones. Word functions are identified, and used to allow different rules to apply to those words. In this way, Festival could for example utter the subject of a sentence at a slower speed, and a verb slightly louder than other words, since basic speaking information can be calculated, in real time, from syntactic information [23]. In common with many systems, Festival uses a pronunciation lexicon. This looks up pronunciation of a word (or a word part, or a near match). Unknown words are pronounced by a letter-to-sound rule. A significant amount of work has also been put into the reading of various punctuation marks and abbreviations in Festival, which improves the perceived naturalness when reading real-world text, such as emails, considerably. For |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling