Applied Speech and Audio Processing: With matlab examples
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
- Bu sahifa navigatsiya:
- 5.4.1.1 CELP codebooks
Speech communications
It is clear that since the codebook is the differentiating factor in the CELP method, we will explore that in a little more detail. First however, we need to remind ourselves of the need for quantisation: the raw LPC, LTP and gain parameters need to be quantised in some way. As we have seen in Section 5.2.5, LPC parameters are rarely used as-is. Within CELP coders they are generally transformed to line spectral pairs prior to being output from the encoder. In fact all of the parameters, with the probable exception of the codebook index, will be quantised, and in the process transformed in some way. Remembering that the encoder incorporates the decoding process in its codebook search loop, it is important to note that the actual parameters used in this part of the encoder are already quantised and then dequantised. The main reason is that if the encoder uses unquantised parameters it may well find a different candidate excitation vector to the one it would choose if operating on quantised-dequantised parameters. Since the actual speech output from the decoder has access only to the quantised-dequantised parameters, then the encoder must use the same values to ensure the best possible speech is generated. 5.4.1.1 CELP codebooks As mentioned previously, each codebook is populated by a number of codewords. These are used as candidate vectors within the encoder, where each candidate is examined in turn, and the candidate that results in the best matching speech frame is chosen. For a typical system that analyses speech in 20 ms frames, and has a sample rate of 8 kHz, the candidate vectors need to consist of 8000 × 0.02 = 160 samples. Quite often, pitch is analysed and represented in subframes that may be 5 ms long – four subframes per frame – and thus the LTP parameters change four times as often as the LPC parameters, but otherwise the processing structure remains unchanged. In the original CELP technique [19,20], the candidate vectors in each codebook were generated from a random number generation algorithm – ‘seeded’ identically at encoder and decoder so that they would contain exactly the same vectors: at the most basic level, a 1024 set of 1 × 160 random numbers. More modern variants will introduce some structure into the codebook [12] – or allow the addition of two vectors to be used as an excitation candidate. Such techniques are known as split codebooks. Many useful enhancements to basic CELP rely on the fact that the CELP encoder (Figure 5.17) actually contains a decoder as part of its structure. That means that, for the same codebook index, the encoder pseudo-speech is identical to the decoder output speech. This is exploited in allowing the encoder to predict the state of the decoder, something necessary for any adaptive system. One such system allows the codebook to adapt: ensuring that both encoder and decoder codebooks adapt equally (keep in step) is tricky, but necessary to ensure performance. Although speech quality improvement has always been the main driving factor behind the advance of the CELP technique, computational complexity reduction has been another significant factor. A third factor has been minimisation of processing la- tency. Three well-known enhancements to CELP are now discussed that address the quality, computational complexity, and processing latency issues. |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling