7.1. Psychoacoustic modelling
163
since this is such an active research area. Above all, the reader is encouraged to access
the literature and develop their own systems.
The processing stages of the simple psychoacoustic model which will now be pre-
sented, are as follows:
1. spectral analysis;
2. critical band warping;
3. critical band function convolution;
4. equal-loudness pre-emphasis;
5. intensity-loudness conversion.
Following the model introduction we will discuss the use of the model and its applicability
to speech.
7.1.1
Spectral analysis
Since simultaneous masking effect occurs in the frequency domain, the first step is to
select a frame of audio to analyse, window it, and convert to a spectral representation.
For an example frame of length 256 in Matlab, we would do the following:
S=abs(fft(hamming(256).*speech));
S=S(1:128);
7.1.2
Critical band warping
Now the spectrum S
(ω) needs to be warped in the frequency domain so instead of
having units in hertz, it fits to a Bark scale (Section 4.3.2). This then represents the
spectral index in Barks, so the effect of the critical band filters can be calculated using
a Bark-domain spreading function. There are, of course, several competing models to
account for spreading across critical bands from different authors. Some of the more
prominent ones will be compared in Section 7.1.7, although for simplicity we shall use
the approach of Hermansky here, related by the following equations [4]:
() =
0
for
< −1.3
10
2.5
(+0.5)
for
− 1.3 ≤ ≤ −0.5
1
for
− 0.5 < < 0.5
10
−1.0(−0.5)
for
0.5
≤ ≤ 2.5
0
for
≥ 2.5.
(7.1)
We had already defined two Matlab functions to convert between Hz and Bark scales
in either direction in Section 4.3.2, named f2bark() and bark2f(). We will use
these to construct an analysis with 40 critical bands uniformly spaced in the Bark dom-
ain, then define critical band filters, through calculating a Bark range of each filter.
164
Do'stlaringiz bilan baham: |