Applied Speech and Audio Processing: With matlab examples
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
Speech communications
required to code more information, thus many algorithms give priority to only the most important aspects for intelligibility, namely the lower entries on the list. 5.3.1 Regular pulse excitation Regular Pulse Excitation (RPE) is a parametric coder that represents the pitch component of speech. It is most famously implemented in ETSI standard 06.10, and currently is the primary mobile speech communications method for over a third of the world’s popula- tion, by any measure an impressive user base. This is due to its use in the GSM standard, developed in the 1980s as a pan-European digital voice standard. It was endorsed by the European Union, and quickly found adoption across Europe and then beyond. GSM codes frames of 160 13-bit speech samples (at a sampling rate of 8 kHz) into 260 compressed bits. A decoder takes these and regenerates 160-sample output speech frames. There are many sources of information on GSM, not least the open standard documents, so there is no need to consider full details here. However we will examine the pitch coding system for GSM 06.10, the traditional or ‘full rate’ standard. In GSM, the original speech is analysed to determine vocal tract parameters (LPC coefficients) which are then used to filter the same vector of 160 speech samples to remove the vocal tract information, leaving a residual. The eight LPC coefficients will be transformed into LARs (Log Area Ratios) for transmission. The residual is then split into four subframes. Each subframe is analysed separately to determine pitch parameters. The analysis is made on the current subframe concatenated with the three previous reconstituted subframes. The reconstituted subframes are those that have been generated from the previous pitch values – those that have been quantised for transmission. Thus they are effectively the subframes as generated by a decoder. These four subframes (the current one, and the three reconstituted ones) form a com- plete frame which is subjected to long-term prediction (LTP) which is actually quite simple, and will be discussed in the next section. When this contribution is removed from each subframe a set of pitch-like spikes remain – assuming of course the pres- ence of pitch in the original speech. An RPE analysis engine compares the subvector of spikes to four candidates, one of which is chosen, along with a location (grid position) to represent the pitch spikes in that subframe. This pulse train is actually coded by ADPCM before transmission. This entire coding process is known as RPE-LTP, and is shown diagrammatically in Figure 5.15. If there were no pitch in the original speech (it has been judged to be unvoiced speech), then the residual is represented as random noise instead. Up to 13 pitch pulses are coded per 40-sample subframe, achieved through downsam- pling at a ratio of 1:3 from several sequence start positions 1, 2 or 3. As can be imag- ined a set of regular pulses is not particularly similar to the pitch waveform shown in |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling