Applied Speech and Audio Processing: With matlab examples

bet	93/170
Sana	18.10.2023
Hajmi	2,66 Mb.
	#1708320

1 ... 89 90 91 92 93 94 95 96 ... 170

Bog'liq
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )

5.2. Parameterisation
113
lines together and quantises these as a set. Typical sets may be a (2, 3, 3, 2) or a (3, 3, 4)
arrangement for a tenth-order system.
Both scalar and vector quantisation, can be applied either to the raw LSP values
themselves, or to differential values, where the difference is either that between a line’s
current position and its position in the previous frame or between its current position
and its mean position [9]. We can refer to these as the short-term and long-term average
differences respectively [10].
An adaptation of long-term average differential quantisation (which uses the distance
between current position, and mean position of a line), is to recalculate the nominal
position every frame based on an even distribution of nominal positions between the
values of the ﬁrst and last LSPs. This is known as Interpolated LSF (or LSF Interpolation,
LSFI) [11]. A different form of interpolation is that applied by the TETRA standard CELP
coder [12], which quantises LSPs which have been interpolated between subframes (of
which there are four per standard-sized frame). This approach can provide a degree of
immunity to the effects of subframes lost due to burst errors.
An effective quantisation scheme will generally minimise either the signal-to-
quantisation noise ratio for typical signals, or will minimise a more perceptually relevant
measure. Such measures could be the commonly-used spectral distortion (SD) value (see
Section 3.3.2) or similar variants. Some published examples are the LSP distance (LD)
[9], LSP spectral weighted distance measure (LPCW) [13], local spectral approximation
weights (LSAW) [14] and inverse harmonic mean weights (IHMW) [15].
In all cases, it is necessary to appreciate the dynamics of the signal to be quantised,
and optionally to assign different levels of importance to critical spectral regions, either
directly, or by allocating greater quantisation accuracy to LSPs with a frequency locus
within such critical regions. It is possible to match regions of spectral importance to LSP
accuracy through the selection of different quantiser resolutions for different lines. For
example, lines 9 and 10 in a tenth-order analysis would relate to formant F3, if present.
This formant can be considered less important to speech intelligibility than formants F1
and F2. Therefore lines 9 and 10 may be quantised with fewer bits than, for example,
lines 5 and 6.
By plotting the LSP line frequency locus for a number of TIMIT speech recordings,
as shown in Figure 5.11, we can see the line localisation in frequency is fairly limited.
The ﬁgure shows which lines are located predominantly in frequency regions of less
importance to intelligibility: these are natural candidates for being quantised with fewer
bits than other lines. The plot was obtained through tenth-order LPC analysis on 40 ms
frames with 50% overlap for different male and female speakers. These LPC coefﬁcients
were then transformed into LSP values, with the relative frequency of their values com-
puted across 40 analysis bins and then plotted in the vertical axis for each of the LSP
lines.

114

Download 2,66 Mb.

Do'stlaringiz bilan baham:

1 ... 89 90 91 92 93 94 95 96 ... 170