Applied Speech and Audio Processing: With matlab examples

bet	98/170
Sana	18.10.2023
Hajmi	2,66 Mb.
	#1708320

1 ... 94 95 96 97 98 99 100 101 ... 170

Bog'liq
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )

Speech communications
required to code more information, thus many algorithms give priority to only the most
important aspects for intelligibility, namely the lower entries on the list.
5.3.1
Regular pulse excitation
Regular Pulse Excitation (RPE) is a parametric coder that represents the pitch component
of speech. It is most famously implemented in ETSI standard 06.10, and currently is the
primary mobile speech communications method for over a third of the world’s popula-
tion, by any measure an impressive user base. This is due to its use in the GSM standard,
developed in the 1980s as a pan-European digital voice standard. It was endorsed by the
European Union, and quickly found adoption across Europe and then beyond.
GSM codes frames of 160 13-bit speech samples (at a sampling rate of 8 kHz) into
260 compressed bits. A decoder takes these and regenerates 160-sample output speech
frames. There are many sources of information on GSM, not least the open standard
documents, so there is no need to consider full details here. However we will examine
the pitch coding system for GSM 06.10, the traditional or ‘full rate’ standard.
In GSM, the original speech is analysed to determine vocal tract parameters (LPC
coefﬁcients) which are then used to ﬁlter the same vector of 160 speech samples to
remove the vocal tract information, leaving a residual. The eight LPC coefﬁcients will
be transformed into LARs (Log Area Ratios) for transmission.
The residual is then split into four subframes. Each subframe is analysed separately to
determine pitch parameters. The analysis is made on the current subframe concatenated
with the three previous reconstituted subframes. The reconstituted subframes are those
that have been generated from the previous pitch values – those that have been quantised
for transmission. Thus they are effectively the subframes as generated by a decoder.
These four subframes (the current one, and the three reconstituted ones) form a com-
plete frame which is subjected to long-term prediction (LTP) which is actually quite
simple, and will be discussed in the next section. When this contribution is removed
from each subframe a set of pitch-like spikes remain – assuming of course the pres-
ence of pitch in the original speech. An RPE analysis engine compares the subvector of
spikes to four candidates, one of which is chosen, along with a location (grid position)
to represent the pitch spikes in that subframe.
This pulse train is actually coded by ADPCM before transmission. This entire coding
process is known as RPE-LTP, and is shown diagrammatically in Figure 5.15. If there
were no pitch in the original speech (it has been judged to be unvoiced speech), then the
residual is represented as random noise instead.
Up to 13 pitch pulses are coded per 40-sample subframe, achieved through downsam-
pling at a ratio of 1:3 from several sequence start positions 1, 2 or 3. As can be imag-
ined a set of regular pulses is not particularly similar to the pitch waveform shown in

Download 2,66 Mb.

Do'stlaringiz bilan baham:

1 ... 94 95 96 97 98 99 100 101 ... 170