Applied Speech and Audio Processing: With matlab examples
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
5.4. Analysis-by-synthesis
129 Figure 5.19 Timing diagram for basic CELP encoder and decoder processing, illustrating processing latency. The first delay comes about due to collection of audio samples into a buffer before they are processed: for a speech system working on a 30 ms analysis frame at a sample rate of 8 kHz, each frame contains 8000 × 0.03 = 240 samples. Processing can typically not begin until all of those 240 samples are available, and naturally the final sample in the buffer was collected 30 ms after the first sample. Even with no other processing, such a system will delay audio by 30 ms. The output buffering arrangements might well affect the system in a similar way. These latencies between input and output ignore the operation of any coder, decoder, and all propagation delays. Next we look at the operation of the CELP encoder of Figure 5.17. Looping around 1024 times, an entire decoding operation must be performed, and then perceptual weight- ing and mean-squared calculation. None of this can begin until the sample input buffer has been filled, and then the following process is likely to require a significant amount of processing time. This does, of course, depend on the clock and calculation speed of the underlying hardware. To put this into perspective, the original inventors of CELP found that their Cray-1 supercomputer required 125 seconds of processing time to process just a single second of speech [20]. Of course, computers are far more powerful today than in the 1980s; however such processing is still very far from instantaneous. As CELP began to be adopted in real systems, figures of 200–300 ms latency were observed. Unfortunately at these latencies, human conversations become rather strained: people begin to speak over one another, and feel uncomfortable with the long pauses in conversation. A useful figure of merit is that most people will not notice a latency of 100–150 ms, but beyond this it starts to become intrusive to conversation. Clearly, a reduced latency CELP was required: both for the input/output buffering, and for the processing times. The solution was found, and standardised as ITU G.728, boasting submillisecond processing latency. The primary technique used to achieve the latency reduction was the forward–backward structure. Despite the unusual name, forward–backward CELP refers to the order in which processing is performed. This is perhaps best illustrated in a timing diagram. First we |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling