Applied Speech and Audio Processing: With matlab examples

bet	153/170
Sana	18.10.2023
Hajmi	2,66 Mb.
	#1708320

1 ... 149 150 151 152 153 154 155 156 ... 170

Bog'liq
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )

7.6. Speech synthesis
183
systems exist which attempt to add this information automatically, and perhaps future
speech recognition systems would be capable of recognising this information?). For a
more complete system, pitch variations, prosodic information, volume modulation and
so on would also need to be added. For the foreseeable future, this task probably re-
quires human intervention, and thus systems that require linguistic marker information
are better suited to small constrained vocabulary systems.
The second method of improving naturalness requires a machine to ‘read’ a sentence
without a-priori knowledge of stress and intonation patterns, to extract such information
from the text, and to add this into its reading. This is much the same task that a human
speaker faces when given a paragraph to read out loud: to scan the text, and decide
which words to stress, where to speed up, increase volume, pause and so on. This task
is no longer considered to be within the domain of speech synthesis since it is one of
interpretation, it is part of natural language processing (NLP) research.
Figure 7.6 provides a block diagram of a generic concatenative speech synthesiser of
the second type mentioned here. A database of stored phonetic sounds is strung together
by the system to match the phonemes of the input text. For some words, the phonetic
sequence is generated through phonetisation rules, but can be overridden by a dictionary
of exceptions to the rule. Occasionally, context would dictate different pronunciation
of a word, and this would be detected through the use of an NLP system. NLP is also
used to identify places in the recreated phoneme string to add in stress patterns, probably
including changes in prosody, pitch variation, volume variation, and so on. It should of
course be noted that this is just one possible method of performing synthesis: there are
countless alternatives represented in research literature [21].
7.6.4
Practical speech synthesis
The Festival speech synthesis system by the University of Edinburgh Centre for Speech
Technology Research is probably the most common synthesis system in use for research
today. It is free software, distributed open-source style, which researchers and developers
can download, test and modify as required [22]. It offers a highly conﬁgurable ‘general
framework’ for speech synthesis research, as well as a fully working system that can
synthesise speech in English, Welsh, American English and Spanish.
Festival is capable of uttering whole sentences, which it can do by formulating a
grammar or syntax structure. In fact, at the word level, a sequence of words in time is
related, inside Festival, to other attributes of those words. Words themselves comprise
phones. Word functions are identiﬁed, and used to allow different rules to apply to those
words. In this way, Festival could for example utter the subject of a sentence at a slower
speed, and a verb slightly louder than other words, since basic speaking information can
be calculated, in real time, from syntactic information [23].
In common with many systems, Festival uses a pronunciation lexicon. This looks
up pronunciation of a word (or a word part, or a near match). Unknown words are
pronounced by a letter-to-sound rule. A signiﬁcant amount of work has also been put into
the reading of various punctuation marks and abbreviations in Festival, which improves
the perceived naturalness when reading real-world text, such as emails, considerably. For

184

Download 2,66 Mb.

Do'stlaringiz bilan baham:

1 ... 149 150 151 152 153 154 155 156 ... 170