Applied Speech and Audio Processing: With matlab examples

bet	133/170
Sana	18.10.2023
Hajmi	2,66 Mb.
	#1708320

1 ... 129 130 131 132 133 134 135 136 ... 170

Bog'liq
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )

Advanced topics
Despite the profusion of applications, there are relatively few fundamental psychoacous-
tic models, or modelling techniques used. Most of the applications employ a subset of a
model predicting masking effect, an equal-loudness pre-emphasis, and an appreciation
of frequency discrimination. Unfortunately, masking effect models do not often cater for
temporal masking, which we presented as an example above (Figure 7.1).
By contrast, simultaneous frequency masking is well catered for in masking models.
This type of masking was described and illustrated in Section 4.2.8, particularly in
Figure 4.3. Simultaneous frequency is well catered for because it is relatively easily
modelled by computer. Models relate to single tones, of given frequency and power,
which cause nearby tones of lower power to be inaudible. In Figure 4.3, the shaded
area showed the extent of the masking effect for the signal of given frequency response:
in essence it is a modiﬁed threshold of audibility similar to that of the equal-loudness
contours of Section 4.2.1. The difference in audibility caused by the presence of the tone
is the masking effect.
Computational models exist for this masking effect due to tones. Of note is that much
of the historical auditory data used to derive the computerised models were performed
under well-deﬁned and controlled conditions, with artiﬁcial signals such as white noise,
and sinewaves used. While it is beyond reasonable doubt that the models describe those
scenarios very well, it has not been established with the same conﬁdence that the models
accurately describe complex real sounds. In fact there is even some doubt that they can be
applied to compound sounds [3]. Despite the doubts, these models are used in practice,
and assume that complex sounds can be broken down into a set of tones, each of which
result in a masking effect such as shown in Figure 4.3, with the overall masking effect
from the sound being the summation of the separate contributions. When calculating
the overall effect, it is possible to introduce nonlinear ‘corrections’ to the model to
compensate for the fact that the effect is not a straightforward summation.
Perhaps more useful is to use a critical band model of the ear. The loudest tone in
each critical band is audible, and the masking effect is the weighted sum of all sound
and noise components within that band. Whilst this does not account for several auditory
factors, it does model the overall situation quite well – especially for situations with a
clear distinction between wanted signal and interfering noise.
In the remainder of this section we will use this approach to develop a simple but
usable psychoacoustic model that has been applied, and tested, in several applications.
The model involves several stages of processing. The assumption is that a particular
audio signal is modelled to determine a threshold of masking (audibility) due to those
sounds. Since there is very little likelihood that the absolute signal levels recorded on
computer are the same as those that would impinge on the ear of a listener, then it must
be stressed that – even assuming no variability among different people – the conclusions
drawn by this model should be treated as being approximations. This model is provided
to enable the reader to rapidly enter the ﬁeld of psychoacoustics: to adjust, extend and
improve on the system described. Many competing – and probably better – models exist,

Download 2,66 Mb.

Do'stlaringiz bilan baham:

1 ... 129 130 131 132 133 134 135 136 ... 170