Applied Speech and Audio Processing: With matlab examples

bet	104/170
Sana	18.10.2023
Hajmi	2,66 Mb.
	#1708320

1 ... 100 101 102 103 104 105 106 107 ... 170

Bog'liq
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )

5.4. Analysis-by-synthesis 125 Figure 5.18

Speech communications
Figure 5.16
A block diagram of part of a CELP encoder, showing original speech being
decomposed into gain, LPC and LTP parameters.
Figure 5.17
A block diagram of the remainder of the CELP encoder. Gain, LPC and LTP
parameters were obtained in the ﬁrst part shown in Figure 5.16, whilst the section now shown is
devoted to determining the optimum codebook index that best matches the analysed speech.
the quality of the system over RPE, we will consider in a little more detail exactly how
it works.

5.4. Analysis-by-synthesis
125
Figure 5.18
A block diagram of the remainder of the CELP decoder utilising codebook index,
gain, LPC and LTP parameters to recreate a frame of speech.
Following a particular analysis frame of speech through the CELP encoder, ﬁrst the
basic gain, pitch and vocal tract parameters are determined (shown in Figure 5.16), and
then these parameters are used to recreate pseudo-speech as in Figure 5.17, as the output
from the LPC analysis ﬁlter. The ﬁrst candidate vector in the codebook, named codeword
0, is used as the lung excitation. Ampliﬁcation, LPC and LTP synthesis ﬁlters add gain,
pitch and vocal tract information to the lung excitation in order to derive a frame of
pseudo-speech.
This pseudo-speech is compared to the original frame of speech. In fact the comparison
simply ﬁnds a difference vector between the two, perceptually weights this (something
we will return to later in Section 7.2), and calculates the mean square for that frame:
a single perceptual error value for the current input speech frame. The process is now
repeated for codeword 1, and again results in a single perceptual error value. Each
element in the codebook is now tried in turn.
For the typical codebook shown, the result will be 1024 perceptual error values.
Each one is a measure of the difference between the pseudo-speech recreated with that
codebook index and the original speech. Thus the codebook index which resulted in the
smallest perceptual error value is the one which can be used to best represent the original
speech, and this index (0 to 1023) is transmitted from encoder to decoder.
At the decoder, shown in Figure 5.18, the transmitted parameters are used to recreate
a frame of decoded speech. The codeword selected in the decoder is that identiﬁed by
the transmitted codebook index. This codeword is identical to that at the same position
in the encoder – and is thus guaranteed to be the best of the candidate vectors. We would
therefore expect the frame of decoded speech to be similar to the original analysed
speech. It is evident that the decoder is considerably simpler than the encoder – at least
1024 times in the example shown (since it does not need to repeat once for each codebook
entry, but just for the indicated codebook entry). This is offset somewhat by the trend
to apply post-ﬁltering to the decoder output (not shown here) in order to improve audio
quality.

126

Download 2,66 Mb.

Do'stlaringiz bilan baham:

1 ... 100 101 102 103 104 105 106 107 ... 170