Applied Speech and Audio Processing: With matlab examples
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
- Bu sahifa navigatsiya:
- 5.4. Analysis-by-synthesis 125 Figure 5.18
Speech communications
Figure 5.16 A block diagram of part of a CELP encoder, showing original speech being decomposed into gain, LPC and LTP parameters. Figure 5.17 A block diagram of the remainder of the CELP encoder. Gain, LPC and LTP parameters were obtained in the first part shown in Figure 5.16, whilst the section now shown is devoted to determining the optimum codebook index that best matches the analysed speech. the quality of the system over RPE, we will consider in a little more detail exactly how it works. 5.4. Analysis-by-synthesis 125 Figure 5.18 A block diagram of the remainder of the CELP decoder utilising codebook index, gain, LPC and LTP parameters to recreate a frame of speech. Following a particular analysis frame of speech through the CELP encoder, first the basic gain, pitch and vocal tract parameters are determined (shown in Figure 5.16), and then these parameters are used to recreate pseudo-speech as in Figure 5.17, as the output from the LPC analysis filter. The first candidate vector in the codebook, named codeword 0, is used as the lung excitation. Amplification, LPC and LTP synthesis filters add gain, pitch and vocal tract information to the lung excitation in order to derive a frame of pseudo-speech. This pseudo-speech is compared to the original frame of speech. In fact the comparison simply finds a difference vector between the two, perceptually weights this (something we will return to later in Section 7.2), and calculates the mean square for that frame: a single perceptual error value for the current input speech frame. The process is now repeated for codeword 1, and again results in a single perceptual error value. Each element in the codebook is now tried in turn. For the typical codebook shown, the result will be 1024 perceptual error values. Each one is a measure of the difference between the pseudo-speech recreated with that codebook index and the original speech. Thus the codebook index which resulted in the smallest perceptual error value is the one which can be used to best represent the original speech, and this index (0 to 1023) is transmitted from encoder to decoder. At the decoder, shown in Figure 5.18, the transmitted parameters are used to recreate a frame of decoded speech. The codeword selected in the decoder is that identified by the transmitted codebook index. This codeword is identical to that at the same position in the encoder – and is thus guaranteed to be the best of the candidate vectors. We would therefore expect the frame of decoded speech to be similar to the original analysed speech. It is evident that the decoder is considerably simpler than the encoder – at least 1024 times in the example shown (since it does not need to repeat once for each codebook entry, but just for the indicated codebook entry). This is offset somewhat by the trend to apply post-filtering to the decoder output (not shown here) in order to improve audio quality. |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling