Applied Speech and Audio Processing: With matlab examples

bet	163/170
Sana	18.10.2023
Hajmi	2.66 Mb.
	#1708320

1 ... 159 160 161 162 163 164 165 166 ... 170

Bog'liq
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )

Advanced topics
contraction of pitch periods should be accomplished with as little damage to the pitch
pulse shape as possible.
There are two ways of achieving this in general. The ﬁrst, time domain method, is to
detect the pitch periods, and scale the waveform shape in a way which is sensitive to
the importance of pitch. The second is to completely separate the pitch signal from the
speech, and then scale the pitch as required whilst either leaving the remaining speech
components untouched, or scaling them in a different way. We will explore two examples:
one of a pitch-synchronous time domain scaling, and the other of an LPC-based speech
decomposition method. Both methods result in reasonable quality scaled speech that,
unless the scaling ratios are very large, can be convincingly natural.
7.9.1
PSOLA
The primary traditional method for pitch scaling in audio is known as PSOLA (Pitch
Synchronous Overlap and Add). This algorithm lives up to its pitch-synchronous name
by ﬁrst determining a fundamental pitch period. It then segments audio into frames
of twice that size, windows them and reassembles the frames using an overlap-add
method at a different rate (see Section 2.4 for a discussion on segmentation and
overlap) [28].
The different rate of reassembly could either be faster or slower than the original,
but as with most such techniques, extreme adjustments can cause signiﬁcant quality
degradation. Figure 7.10 demonstrates the process of speeding up a recording of speech.
In this ﬁgure, a period of input speech (top waveform) is analysed to determine its
fundamental pitch period, M. The speech is then segmented into frames of size 2M
with a 50% overlap, ideally centred on the pitch pulse. Each frame is windowed (see
Section 2.4.2), and the resultant audio is ‘stitched together’ at a different rate. In this
case, the pitch pulses are more frequent, thus increasing the pitch rate of the resulting
audio.
Matlab
code to demonstrate the effectiveness of the PSOLA algorithm is provided
below. This relies upon the function ltp() for the pitch extraction method, which we
developed in Section 5.3.2.1 to perform long-term prediction (LTP).
Within the code, a Hamming window is applied to frames of size 2M . The array
indexing, using variables fr1, to1, fr2 and to2 to denote the start and end indices of
each array, is the heart of the method. This can be applied to a short recording of a couple
of words of speech. In this case, scaling of 0.7 will very clearly speed up the speech,
and a scaling of perhaps 1.4 will slow down the speech. Note that the intelligibility
of the speech remains clear, although the characteristics of the speaker’s voice will
change.
%Determine the pitch with a 1-tap LTP
[B, M] = ltp(sp);
%Scaling ratio
sc=0.35;

Download 2.66 Mb.

Do'stlaringiz bilan baham:

1 ... 159 160 161 162 163 164 165 166 ... 170