Applied Speech and Audio Processing: With matlab examples

bet	164/170
Sana	18.10.2023
Hajmi	2.66 Mb.
	#1708320

1 ... 160 161 162 163 164 165 166 167 ... 170

Bog'liq
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )

Advanced topics

7.9. Voice and pitch changer
195
Figure 7.10
An illustration of the PSOLA algorithm, analysing the pitch period in the top
waveform, segmenting it into double sized frames, one per pitch period which are then
windowed and reassembled by overlap-adding with reduced spacing to create a faster pitch
signal result in the lower waveform.
M2=round(M*sc);
out=zeros(N*M2+M,1);
win=hamming(1, 2*M);
%Segment the recording into N frames
N=floor(length(sp)/M);
%Window each and reconstruct
for n=1:N-1
%Indexing is all important
fr1=1+(n-1)*M;
to1=n*M+M;
seg=sp(fr1:to1).*win;
fr2=1+(n-1)*M2-M;
to2=(n-1)*M2+M;
fr2b=max([1,fr2]);
%Avoid negative indexing
out(fr2b:to2)=out(fr2b:to2)+seg(1+fr2b-fr2:2*M);
end

196
Advanced topics
Most probably, speech scaled by the PSOLA algorithm above, will still sound reason-
ably true to human speech (in contrast to a straightforward adjustment of sample rate).
PSOLA is also reported to work very well with music.
7.9.2
LSP-based method
For adjustment of speech, with potentially better performance than PSOLA, the speech
signal can be decomposed using a CELP-style analysis-by-synthesis system, into pitch
and vocal tract components. The pitch can then be scaled linearly as required, and the
vocal tract resonances can also be tugged upward or downward (by smaller amounts) to
further scale the speech.
In the CELP vocoder, these alterations can be performed between the encode and the
decode process, on the encoded speech parameters themselves. LSP parameter changes
(on formant-describing line pairs) are used to tug formants either higher or lower in
frequency. Scaling the pitch delay parameter in a one-tap LTP (see Section 5.3.2.1) simi-
larly adjusts the pitch period. With these changes it is possible to shift vocal frequencies,
either to change the pitch of a speaker’s voice, or to scramble their voice in some
way.
A block diagram of a CELP codec modiﬁed to perform pitch changing and voice
scrambling is shown in Figure 7.11. It can be seen that the encoder and the decoder
themselves are identical to the standard CELP coder of Chapter 5, although in practice
it would not be necessary to highly quantise the CELP analysis parameters as we would
do in a speech compression system.
So if these vocal parameters are not severely quantised, then the targeted adjustments
made to the LSP and LTP parameters would cause changes to the processed speech, and
these may well be the only perceptible changes made to the processed speech.
These changes are namely the scaling of the LTP delay parameter, and the shifting of
formant-describing LSP pairs. For both, a blanket shift throughout a speech recording
would probably work, but a more intelligent system which shifted based upon an analysis
of the underlying speech, could provide better performance. This could, for example,
scale the pitch and shift the formants of voiced speech, but leave the pitch (if any) and
formants of unvoiced speech untouched.
Although this technique should properly be inserted into a continuous CELP-like
analysis-by-synthesis structure, something of the potential can be demonstrated using
the Matlab code below. In this case, ideally a very short single voiced phoneme such
as /a/ needs to be recorded into array sp. We will then perform a single-tap LTP pitch
extraction, and an LPC analysis on the voice minus pitch before we scale the LSPs, scale
the pitch delay and recreate the speech:
%Determine the pitch with a 1-tap LTP
[Beta, tapA] = ltp(sp);

Download 2.66 Mb.

Do'stlaringiz bilan baham:

1 ... 160 161 162 163 164 165 166 167 ... 170