Applied Speech and Audio Processing: With matlab examples

bet	72/170
Sana	18.10.2023
Hajmi	2,66 Mb.
	#1708320

1 ... 68 69 70 71 72 73 74 75 ... 170

Bog'liq
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )

Infobox 5.1

References
[22] “ISO/MPEG – Audio Standard layers”. Editorial pages. Sound Studio Magazine, pages 40–41,
July 1992.
[23] J. I. Alcantera, G. J. Dooley, P. J. Blamey, and P. M. Seligman. Preliminary evaluation of a formant
enhancement algorithm on the perception of speech in noise for normally hearing listeners. J.
Audiology, 33(1): 15–24, 1994.
[24] A. Azirani, R. Jeannes, and G. Faucon. Optimizing speech enhancement by exploiting mask-
ing properties of the human ear. Proc. Int. Conf. on Acoustics, Speech and Signal Processing,
Vol. 1 pages 800–803, 1995.
[25] R. E. P. Dowling and L. F. Turner. Modelling the detectability of changes in auditory signals. Proc.
Int. Conf. on Acoustics, Speech and Signal Processing, Vol. 1 pages 133–136, 1993.
[26] A. S. Bregman. Auditory Scene Analysis. MIT Press, 1990.
[27] H. Purwins, B. Blankertz, and K. Obermayer. Computing auditory perception. Organised Sound,
5(3): 159–171, 2000.

5
Speech communications
Chapters 1, 2 and 3 described the foundations of speech signal processing – the charac-
teristics of audio signals in general, methods of handling and processing them – and the
features of speech as produced and understood by humans. In particular we have covered
some basic Matlab methods for handling speech and audio which we will build upon
in this chapter as we embark upon an exploration of the handling of speech signals in
more depth.
This chapter will consider typical speech handling in terms of speech coding and
compression (rather than in terms of speech classiﬁcation and recognition, which often
use similar techniques but are higher level in nature). We will ﬁrst consider quantisation
of speech, which assumes that speech is simply a general audio waveform (i.e. it does
not incorporate any knowledge of the characteristics of speech).
Knowledge of speech features and characteristics allows for parameterisation of the
speech signal, and then source ﬁlter modelling which will be considered in turn. Perhaps
the pinnacle of achievement in these approaches is the CELP (Codebook Excited Linear
Prediction) speech compression techniques, which will be discussed in the ﬁnal section.
Infobox 5.1 Speech coding objectives
Speech compression, or codec systems, are classiﬁed according to what they compress: speech,
or general audio, how well they compress this, and how well they perform in terms of quality or
intelligibility (which were differentiated and measured in Section 3.3.1). To aid in this classiﬁca-
tion, there is a general agreement on terms used to describe the quality of speech handled by each
method. The table below lists the more common terms, and describes them in terms of sample
rate, bandwidth, approximate dynamic range and mean opinion score (MOS – see Section 3.3.2).
All ﬁgures given are approximate guides to the typical characteristics of such systems:
Name
Sample rate
Bandwidth
Dynamic range
MOS
synthetic quality
–
–
48 dB
2.5–3.5
communications quality
7200 Hz
200–2000 Hz
56 dB
3.5–4.0
toll quality
8000 Hz
200–3200 Hz
64 dB
4.0
network quality
16 000 Hz
20–7000 Hz
80 dB
4.0–4.5
Toll quality refers to ‘telephone audio’, based on the analogue telephone network, but often
brought into the realm of digital measurements. For analogue systems a signal-to-noise ratio of
30 dB, and 200 Hz to 3.2 kHz bandwidth, measured at the 3 dB points, is typical.
89

Download 2,66 Mb.

Do'stlaringiz bilan baham:

1 ... 68 69 70 71 72 73 74 75 ... 170