Applied Speech and Audio Processing: With matlab examples

bet	148/170
Sana	18.10.2023
Hajmi	2.66 Mb.
	#1708320

1 ... 144 145 146 147 148 149 150 151 ... 170

Bog'liq
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )

Advanced topics
Figure 7.5
Block diagram of a generic speech recognition system, showing input speech cleaned
up and ﬁltered in a pre-processing block, feature extraction, and then the matching and decoding
processes driven from predeﬁned models of the sounds, language and words being
recognised.
such as English, the number of phonemes in different words can vary signiﬁcantly, so
again a dictionary can be used to adjust the parameter n in the n-gram language model.
The output of the ASR system could be given as a string of phonemes, but is more
usefully delivered as a sequence of recognised words, and again this depends upon the
particular application and conﬁguration of the system. The models of speech themselves,
namely the acoustic model, language model, and dictionary as shown in Figure 7.5, are
particularly important in an ASR system. A sound missing from the acoustic model,
language features not covered by the language model, and words not in the dictionary
cannot be recognised. Although the dictionary is often created from predeﬁned word
lists, the two models are usually the result of training. Whilst it is theoretically possible
to deﬁne language and acoustic rules by hand, it is far easier and more accurate to
train a system using representative speech to build up these models statistically. For a
system operating with different speakers, it can be better to detect who is speaking (see
Section 7.3) and then switch to an individual acoustic model, than it is to have one big
acoustic model to cover everyone. Similarly, for systems encompassing several different
languages or dialects (see Section 7.4) it can be better to detect these and switch language
models appropriately.
Up to this point, we have discussed ASR systems in general. However it is instructive
to brieﬂy turn our attention to a particular example of state-of-the-art speech recognition:
Sphinx.
The open-source Sphinx recogniser, originally developed at Carnegie Mellon
University in the USA, is one of the best examples of a ﬂexible modern speech recognition
system. It can be used for single word recognition, or expanded up to large vocabularies
of tens of thousands of words, can run on a tiny embedded system (PocketSphinx) or
on a large and powerful server (which could run the Java language Sphinx-4), and is
constantly updated and evaluated within the speech recognition research ﬁeld.

Download 2.66 Mb.

Do'stlaringiz bilan baham:

1 ... 144 145 146 147 148 149 150 151 ... 170