Applied Speech and Audio Processing: With matlab examples

bet	103/170
Sana	18.10.2023
Hajmi	2.66 Mb.
	#1708320

1 ... 99 100 101 102 103 104 105 106 ... 170

Bog'liq
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )

5.4. Analysis-by-synthesis
123
be quite low – in fact it may be less than the matching between the original sinewave
and random noise. However the perceived difference between the sinewaves is probably
zero, but huge when one signal is random noise.
Therefore, most practical analysis-by-synthesis algorithms use a perceptual matching
criterion: either a perceptual weighting ﬁlter or something like a spectral distortion
measure (see Section 3.3.2), rather than a mean-squared match.
5.4.1
Basic CELP
CELP is the logical culmination of an evolutionary process in speech compression algo-
rithms: it can provide excellent quality speech at low bitrates and is a common choice
for speech products. It utilises a source ﬁlter model of speech, parameterised as we have
seen with gain, vocal tract, pitch and lung excitation information.
CELP stands for either Code Excited Linear Prediction or Codebook Excited Linear
Prediction depending on whom you ask. What is certain, though, is that the technique
collectively describes quite a variety of similarly structured algorithms. We will begin
with the basic structure, and subsequently look brieﬂy at algebraic, adaptive and split
variants. Be aware that this has been an area of intense research activity for over a decade
now: many mutant forms have emerged in the research literature.
We will start with the basic CELP encoder, designed to decompose a speech signal into
various parameters. A block diagram of such a system is shown in Figure 5.16. This shows
the speech signal being ﬁltered (including normalisation, yielding gain information),
segmented, and windowed. It is then analysed for pitch components (represented by
LTP parameters) and vocal tract resonances (represented by LPC coefﬁcients). Readers
may note the similarity to the RPE system in Figure 5.15, and indeed both coders do
share many characteristics.
Where the CELP and RPE systems differ most greatly is in the handling of the original
lung excitation signal. RPE treats this as either white Gaussian noise, or as a pulse-train.
The CELP coder takes a different approach: it utilises a large codebook of candidate
vectors at both encoder and decoder, and essentially runs through an iterative process to
attempt to identify which of the candidate excitation vectors best represents the actual
lung excitation.
At least that is the theory – in practice none of the parameters exactly characterise the
required information perfectly. This means that both the LPC and LTP representations,
neither being perfect, will ‘leak’ information. In the RPE encoder, vocal tract informa-
tion not caught by the LPC analysis is unlikely to be picked up by the LTP and RPE
analysis, and so that information will be lost to the encoder, and consequently not trans-
mitted to the decoder. This contributes to loss of quality in speech processed by such a
system.
In CELP, the codebook of candidate excitation vectors can often pick up some of
the information which was not caught by the LTP and LPC analysis. So in practice the
codebook does not just model lung excitation. Since this mechanism greatly improves

124

Download 2.66 Mb.

Do'stlaringiz bilan baham:

1 ... 99 100 101 102 103 104 105 106 ... 170