Applied Speech and Audio Processing: With matlab examples
Automatic speech recognition (ASR)
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
- Bu sahifa navigatsiya:
- 7.5. Speech recognition 175 Table 7.1.
Automatic speech recognition (ASR) describes a system that can recognise speech
without additional user input. Continuous speech recognition describes a speech recognition system that can recog- nise continuous sentences of speech. In theory this would not require a user to pause when speaking, and would include dictation and transcription systems. The alternative is a discrete word recognition system, used primarily for handling vocal commands, that recognises single words delimited by pauses. Natural language processing (NLP), whilst not strictly limited to speech, describes the computational methods needed for a computer to understand the meaning of what is being said, rather than simply knowing what words have been said. For an automated transcription system, the meaning may be irrelevant, but to create a virtual butler able to cater to human needs, the meaning of what is said would be important. In general, we will consider primarily the case of discrete word recognisers, since this is a lower-level recognition task, closer to the physical parameters being analysed. 7.5.2 Speech recognition performance Established researcher Victor Zue and colleagues have identified several parameters that can be used to characterise speech recognition systems and their performance (given in Section 1.2 of a wide-ranging survey report [16]). Based upon this work, Table 7.1 lists several characteristic parameters of these systems. 7.5. Speech recognition 175 Table 7.1. Speech recognition system parameters. Parameter Typical range Speech type Single words–continuous sentences Training In advance–continuous Users Single–open access Vocabulary Small–large SNR Low–high Transducer Restricted–unrestricted Many, if not all, recognition systems require some form of training which acclimatises the system, either to the speech of a particular individual, or to a group of individuals. This can be accomplished in advance where the speaker (or speakers) are known, or otherwise could be an ‘on-line’ gradual training during operation. The question of whether the system has been designed to operate for single users, small groups, or for unrestricted users evidently impacts the training methods employed. Of each of these parameters so far, a single word, pre-trained, single user system, is by far the simplest system to design with reasonable accuracy. Any deviation from this simple combination will incur a penalty in terms of either design complexity, reduced accuracy, or both. Recognisers that use rules of syntax, perhaps some sort of artificial grammar, might well benefit from being supplied with continuous speech, but by and large, single words are easier to recognise. In terms of vocabulary size, it is reasonable to assume that the larger the vocabulary, the more difficulty any system will have in accurately detecting a word. In fact, the same is true for human speech. Figure 7.4 shows a plot resulting from a re-examination of the data shown in Figure 3.6 (Section 3.3.4), which itself is derived from experi- mental results obtained over half a century ago by Miller et al. [17]. This figure plots results for various signal-to-noise ratios of speech, as heard by human listeners, and excludes the extreme accuracy points above 85% or below 20% when other effects such as saturation come into play. The ‘recognition accuracy’ or articulation index is plotted against the logarithm of the vocabulary size, showing a clearly logarithmic relationship. In fact, the logarithmic relationship has been shown to be present in some of the largest variable-vocabulary recognisers, such as the famous Bellcore telephone directory assistance system [18]. It should be noted that the continuous speech recognition research field also includes topic-specific recognisers, which are trained with a vocabulary of subject-specific words (such as a recognition system aimed at understanding medical terminology, or one trained on legal terminology). Another subset in the research field is systems which can attempt to determine the topic of speech under analysis, and switch to the appropriate vocabulary database as required. |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling