Applied Speech and Audio Processing: With matlab examples
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
Advanced topics
Figure 7.5 Block diagram of a generic speech recognition system, showing input speech cleaned up and filtered in a pre-processing block, feature extraction, and then the matching and decoding processes driven from predefined models of the sounds, language and words being recognised. such as English, the number of phonemes in different words can vary significantly, so again a dictionary can be used to adjust the parameter n in the n-gram language model. The output of the ASR system could be given as a string of phonemes, but is more usefully delivered as a sequence of recognised words, and again this depends upon the particular application and configuration of the system. The models of speech themselves, namely the acoustic model, language model, and dictionary as shown in Figure 7.5, are particularly important in an ASR system. A sound missing from the acoustic model, language features not covered by the language model, and words not in the dictionary cannot be recognised. Although the dictionary is often created from predefined word lists, the two models are usually the result of training. Whilst it is theoretically possible to define language and acoustic rules by hand, it is far easier and more accurate to train a system using representative speech to build up these models statistically. For a system operating with different speakers, it can be better to detect who is speaking (see Section 7.3) and then switch to an individual acoustic model, than it is to have one big acoustic model to cover everyone. Similarly, for systems encompassing several different languages or dialects (see Section 7.4) it can be better to detect these and switch language models appropriately. Up to this point, we have discussed ASR systems in general. However it is instructive to briefly turn our attention to a particular example of state-of-the-art speech recognition: Sphinx. The open-source Sphinx recogniser, originally developed at Carnegie Mellon University in the USA, is one of the best examples of a flexible modern speech recognition system. It can be used for single word recognition, or expanded up to large vocabularies of tens of thousands of words, can run on a tiny embedded system (PocketSphinx) or on a large and powerful server (which could run the Java language Sphinx-4), and is constantly updated and evaluated within the speech recognition research field. |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling