Applied Speech and Audio Processing: With matlab examples
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
5.4. Analysis-by-synthesis
123 be quite low – in fact it may be less than the matching between the original sinewave and random noise. However the perceived difference between the sinewaves is probably zero, but huge when one signal is random noise. Therefore, most practical analysis-by-synthesis algorithms use a perceptual matching criterion: either a perceptual weighting filter or something like a spectral distortion measure (see Section 3.3.2), rather than a mean-squared match. 5.4.1 Basic CELP CELP is the logical culmination of an evolutionary process in speech compression algo- rithms: it can provide excellent quality speech at low bitrates and is a common choice for speech products. It utilises a source filter model of speech, parameterised as we have seen with gain, vocal tract, pitch and lung excitation information. CELP stands for either Code Excited Linear Prediction or Codebook Excited Linear Prediction depending on whom you ask. What is certain, though, is that the technique collectively describes quite a variety of similarly structured algorithms. We will begin with the basic structure, and subsequently look briefly at algebraic, adaptive and split variants. Be aware that this has been an area of intense research activity for over a decade now: many mutant forms have emerged in the research literature. We will start with the basic CELP encoder, designed to decompose a speech signal into various parameters. A block diagram of such a system is shown in Figure 5.16. This shows the speech signal being filtered (including normalisation, yielding gain information), segmented, and windowed. It is then analysed for pitch components (represented by LTP parameters) and vocal tract resonances (represented by LPC coefficients). Readers may note the similarity to the RPE system in Figure 5.15, and indeed both coders do share many characteristics. Where the CELP and RPE systems differ most greatly is in the handling of the original lung excitation signal. RPE treats this as either white Gaussian noise, or as a pulse-train. The CELP coder takes a different approach: it utilises a large codebook of candidate vectors at both encoder and decoder, and essentially runs through an iterative process to attempt to identify which of the candidate excitation vectors best represents the actual lung excitation. At least that is the theory – in practice none of the parameters exactly characterise the required information perfectly. This means that both the LPC and LTP representations, neither being perfect, will ‘leak’ information. In the RPE encoder, vocal tract informa- tion not caught by the LPC analysis is unlikely to be picked up by the LTP and RPE analysis, and so that information will be lost to the encoder, and consequently not trans- mitted to the decoder. This contributes to loss of quality in speech processed by such a system. In CELP, the codebook of candidate excitation vectors can often pick up some of the information which was not caught by the LTP and LPC analysis. So in practice the codebook does not just model lung excitation. Since this mechanism greatly improves |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling