Applied Speech and Audio Processing: With matlab examples

bet	87/170
Sana	18.10.2023
Hajmi	2.66 Mb.
	#1708320

1 ... 83 84 85 86 87 88 89 90 ... 170

Bog'liq
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )

Speech communications

5.2. Parameterisation
105
The set of P linear equations can thus be represented as the following matrix:







R
(0)
R
(1)
R
(2) . . . R(P − 1)
R
(1)
R
(0)
R
(1) . . . R(P − 2)
R
(2)
R
(1)
R
(0) . . . R(P − 3)
..
.
..
.
..
.
. ..
..
.
R
(P − 1) R(P − 2) R(P − 3) . . . R(0)














a
1
a
2
a
3
..
.
a
P







=







R
(1)
R
(2)
R
(3)
..
.
R
(P)







.
(5.14)
In practice a window, usually Hamming (Section 2.4.2), is applied to the input speech
prior to calculating the autocorrelation functions, and the entire autocorrelation results
are usually normalised by dividing by R(0) ﬁrst. These normalised coefﬁcients are
denoted r
(i).
Standard techniques exist for the matrix solution including brute force matrix inver-
sion, the famous Durbin–Levinson–Itakura method, or the Le Roux method which is
slightly less efﬁcient but is a compact and easily followed recursive formula [3]:
k
n
+1
=
e
n
n
+1
e
n
0
for
n
= 0, . . . , P
(5.15)
e
n
+1
0
= e
n
0
− k
n
+1
e
n
n
+1
= e
n
0
(1 − k
2
n
+1
)
(5.16)
e
n
+1
i
= e
n
i
− k
n
+1
e
n
n
+1−i
for
i
= n, . . . , P
(5.17)
where the initial conditions for the recursion are set to e
0
i
= R(i) for each i in the set of
P equations.
The values of k that are obtained from the Le Roux method are the reﬂection coefﬁ-
cients.
5.2.3
Converting between reﬂection coefﬁcients and LPCs
Conversion from the reﬂection coefﬁcients (which in some cases are quantised and
transmitted from speech encoder to decoder) to the LPC parameters which are required
for the LPC synthesis ﬁlter to recreate encoded speech, is not difﬁcult. The relationship
is shown in Equation (5.18) for all P coefﬁcients, where the notation a
i
j
indicates the jth
LPC coefﬁcient at time instant i:
a
i
j
= a
(i−1)
j
+ k
i
a
(i−1)
(i−j)
with
1
≤ j ≤ i − 1.
(5.18)
In order to perform the reverse conversion from LPC parameters into reﬂection
coefﬁcients, we start with the initial value:
k
i
= a
i
j

106
Speech communications
and then follow with:
a
(i−1)
j
=
a
i
j
− a
i
i
a
i
(i−j)
1
− k
2
i
with
1
≤ j ≤ i − 1
(5.19)
where Equation (5.19) is repeated with i decreasing from P to 1 with initial conditions
of a
P
j
= a
j
for all j’s between 1 and P.
5.2.4
Line spectral pairs
Line spectral pairs (LSPs) are a direct mathematical transformation of the set of LPC
parameters, and are generated within many speech compression systems, such as the
more modern CELP coders (which will be discussed later in Section 5.4.1). LSP usage
is popular due to their excellent quantisation characteristics and consequent efﬁciency
of representation. They are also commonly referred to as Line Spectral Frequencies
(LSF) [3].
LSPs collectively describe the two resonance conditions arising from an intercon-
nected tube model of the human vocal tract. This includes mouth shape and nasal cav-
ity, and forms the basis of the underlying physical relevance of the linear prediction
representation. The two resonance conditions describe the modelled vocal tract as being
either fully open or fully closed at the glottis respectively (compare this to the model
of the reﬂection coefﬁcients in Section 5.2.2). The model in question is constructed
from a set of equal length but different diameter tubes, so the two conditions mean the
source end is either closed or open respectively. The two conditions give rise to two sets
of resonant frequencies, with the number of resonances in each set being determined
by the number of joined tubes (which is deﬁned by the order of the analysis system).
The resonances of each condition give rise to odd and even line spectral frequencies
respectively, and are interleaved into a set of LSPs which have monotonically increasing
value.
In reality, however, the human glottis opens and closes rapidly during speech – it
is neither fully closed nor fully open. Hence actual resonances occur at frequencies
located somewhere between the two extremes of each LSP condition. Nevertheless, this
relationship between resonance and LSP position lends a signiﬁcant physical basis to
the representation. Figure 5.10 illustrates LSPs overlaid on an LPC spectral plot (made
using the lpcsp() Matlab function given later in Section 5.2.4.3). The 10 vertical
lines were drawn at the LSP frequencies, and show the odd and even frequencies being
interleaved. Both the LSPs and the spectrum were derived from the same set of tenth-
order linear prediction parameters which were obtained from a linear predictive analysis
of a 20 ms voiced speech frame.
Notable features of Figure 5.10 include the natural interleaving of the odd and even
LSP frequencies, and the fact that spectral peaks tend to be bracketed by a narrow
pair of lines (explained by the comment previously indicating that the actual resonance

Download 2.66 Mb.

Do'stlaringiz bilan baham:

1 ... 83 84 85 86 87 88 89 90 ... 170