Applied Speech and Audio Processing: With matlab examples
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
- Bu sahifa navigatsiya:
- Infobox 5.1
References
[22] “ISO/MPEG – Audio Standard layers”. Editorial pages. Sound Studio Magazine, pages 40–41, July 1992. [23] J. I. Alcantera, G. J. Dooley, P. J. Blamey, and P. M. Seligman. Preliminary evaluation of a formant enhancement algorithm on the perception of speech in noise for normally hearing listeners. J. Audiology, 33(1): 15–24, 1994. [24] A. Azirani, R. Jeannes, and G. Faucon. Optimizing speech enhancement by exploiting mask- ing properties of the human ear. Proc. Int. Conf. on Acoustics, Speech and Signal Processing, Vol. 1 pages 800–803, 1995. [25] R. E. P. Dowling and L. F. Turner. Modelling the detectability of changes in auditory signals. Proc. Int. Conf. on Acoustics, Speech and Signal Processing, Vol. 1 pages 133–136, 1993. [26] A. S. Bregman. Auditory Scene Analysis. MIT Press, 1990. [27] H. Purwins, B. Blankertz, and K. Obermayer. Computing auditory perception. Organised Sound, 5(3): 159–171, 2000. 5 Speech communications Chapters 1, 2 and 3 described the foundations of speech signal processing – the charac- teristics of audio signals in general, methods of handling and processing them – and the features of speech as produced and understood by humans. In particular we have covered some basic Matlab methods for handling speech and audio which we will build upon in this chapter as we embark upon an exploration of the handling of speech signals in more depth. This chapter will consider typical speech handling in terms of speech coding and compression (rather than in terms of speech classification and recognition, which often use similar techniques but are higher level in nature). We will first consider quantisation of speech, which assumes that speech is simply a general audio waveform (i.e. it does not incorporate any knowledge of the characteristics of speech). Knowledge of speech features and characteristics allows for parameterisation of the speech signal, and then source filter modelling which will be considered in turn. Perhaps the pinnacle of achievement in these approaches is the CELP (Codebook Excited Linear Prediction) speech compression techniques, which will be discussed in the final section. Infobox 5.1 Speech coding objectives Speech compression, or codec systems, are classified according to what they compress: speech, or general audio, how well they compress this, and how well they perform in terms of quality or intelligibility (which were differentiated and measured in Section 3.3.1). To aid in this classifica- tion, there is a general agreement on terms used to describe the quality of speech handled by each method. The table below lists the more common terms, and describes them in terms of sample rate, bandwidth, approximate dynamic range and mean opinion score (MOS – see Section 3.3.2). All figures given are approximate guides to the typical characteristics of such systems: Name Sample rate Bandwidth Dynamic range MOS synthetic quality – – 48 dB 2.5–3.5 communications quality 7200 Hz 200–2000 Hz 56 dB 3.5–4.0 toll quality 8000 Hz 200–3200 Hz 64 dB 4.0 network quality 16 000 Hz 20–7000 Hz 80 dB 4.0–4.5 Toll quality refers to ‘telephone audio’, based on the analogue telephone network, but often brought into the realm of digital measurements. For analogue systems a signal-to-noise ratio of 30 dB, and 200 Hz to 3.2 kHz bandwidth, measured at the 3 dB points, is typical. 89 |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling