Applied Speech and Audio Processing: With matlab examples

bet	138/170
Sana	18.10.2023
Hajmi	2,66 Mb.
	#1708320

1 ... 134 135 136 137 138 139 140 141 ... 170

Bog'liq
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )

7.3. Speaker classiﬁcation

Advanced topics
Figure 7.3
Comparison of several critical-band spreading functions from various authors.
The models are applied on a per-band basis. The centre frequency of each critical
band is used as a reference, and sound falling in that critical band will have a masking
effect on other sounds as represented by the function. For example, using the Hermansky
model, a sound within the critical band plotted will mask a sound 0.8 Bark lower that
has amplitude 0.2 times or less.
7.2
Perceptual weighting
The perceptual error weighting ﬁlter (PEWF) is a common sight within speech coders.
In this context it has a particular meaning which may not be quite the same as its
interpretation elsewhere. Based on its context of speech coding, it makes use of linear
prediction parameters, which themselves encode vocal tract information. It uses these
parameters to ‘strengthen’ resonances, and thus to increase formant power in encoded
speech.
The idea is that, since the formant regions are more relevant to human perception,
the weighting process improves the perception of these. Whilst this argument is true
of speech, for general music systems, perceptual weighting more often involves either
the use of a perceptual model, or simply the application of a digital version of the A-
weighting ﬁlter. Here we will present a typical PEWF as found within a CELP speech
coder. The LPC synthesis ﬁlter is termed H
(z), and two bandwidth expansion factors
are used,
ζ
1
and
ζ
2
with the relationship
ζ
1
< ζ
2
≤ 1. Then the weighting ﬁlter W (z) is
deﬁned as:

7.3. Speaker classiﬁcation
169
W
(z) =
1
− H(z/ζ
1
)
1
− H(z/ζ
2
)
.
(7.6)
Remembering that the LPC synthesis ﬁlter is deﬁned as:
H
(z) =
P

k
=1
a
k
z
−k
(7.7)
then the frequency scaled version will be:
H
(z/ζ) =
P

k
=1
ζ
k
a
k
z
−k
.
(7.8)
Taking Equation (7.8) in difference form, and substituting into Equation (7.6), the PEWF
is quite simply realised in discrete terms as:
y
[n] = x[n] +
P

k
=1
a
k
{ζ
k
2
y
[n − k] − ζ
k
1
x
[n − k]}.
(7.9)
This can be applied at the output of a CELP speech decoder to slightly enhance the
intelligibility of voiced speech, and also be used within the CELP encoder to accen-
tuate the importance of any formant regions within the codebook search loop (i.e. to
weight the mean-squared matching process toward any formants that may be present).
Typically the value of
ζ
1
and
ζ
2
is very close to unity. In the past the author has used
ζ
1
= 0.95 and ζ
2
= 1.0 for several systems.

Download 2,66 Mb.

Do'stlaringiz bilan baham:

1 ... 134 135 136 137 138 139 140 141 ... 170