Applied Speech and Audio Processing: With matlab examples
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
- Bu sahifa navigatsiya:
- Advanced topics
7.9. Voice and pitch changer
195 Figure 7.10 An illustration of the PSOLA algorithm, analysing the pitch period in the top waveform, segmenting it into double sized frames, one per pitch period which are then windowed and reassembled by overlap-adding with reduced spacing to create a faster pitch signal result in the lower waveform. M2=round(M*sc); out=zeros(N*M2+M,1); win=hamming(1, 2*M); %Segment the recording into N frames N=floor(length(sp)/M); %Window each and reconstruct for n=1:N-1 %Indexing is all important fr1=1+(n-1)*M; to1=n*M+M; seg=sp(fr1:to1).*win; fr2=1+(n-1)*M2-M; to2=(n-1)*M2+M; fr2b=max([1,fr2]); %Avoid negative indexing out(fr2b:to2)=out(fr2b:to2)+seg(1+fr2b-fr2:2*M); end 196 Advanced topics Most probably, speech scaled by the PSOLA algorithm above, will still sound reason- ably true to human speech (in contrast to a straightforward adjustment of sample rate). PSOLA is also reported to work very well with music. 7.9.2 LSP-based method For adjustment of speech, with potentially better performance than PSOLA, the speech signal can be decomposed using a CELP-style analysis-by-synthesis system, into pitch and vocal tract components. The pitch can then be scaled linearly as required, and the vocal tract resonances can also be tugged upward or downward (by smaller amounts) to further scale the speech. In the CELP vocoder, these alterations can be performed between the encode and the decode process, on the encoded speech parameters themselves. LSP parameter changes (on formant-describing line pairs) are used to tug formants either higher or lower in frequency. Scaling the pitch delay parameter in a one-tap LTP (see Section 5.3.2.1) simi- larly adjusts the pitch period. With these changes it is possible to shift vocal frequencies, either to change the pitch of a speaker’s voice, or to scramble their voice in some way. A block diagram of a CELP codec modified to perform pitch changing and voice scrambling is shown in Figure 7.11. It can be seen that the encoder and the decoder themselves are identical to the standard CELP coder of Chapter 5, although in practice it would not be necessary to highly quantise the CELP analysis parameters as we would do in a speech compression system. So if these vocal parameters are not severely quantised, then the targeted adjustments made to the LSP and LTP parameters would cause changes to the processed speech, and these may well be the only perceptible changes made to the processed speech. These changes are namely the scaling of the LTP delay parameter, and the shifting of formant-describing LSP pairs. For both, a blanket shift throughout a speech recording would probably work, but a more intelligent system which shifted based upon an analysis of the underlying speech, could provide better performance. This could, for example, scale the pitch and shift the formants of voiced speech, but leave the pitch (if any) and formants of unvoiced speech untouched. Although this technique should properly be inserted into a continuous CELP-like analysis-by-synthesis structure, something of the potential can be demonstrated using the Matlab code below. In this case, ideally a very short single voiced phoneme such as /a/ needs to be recorded into array sp. We will then perform a single-tap LTP pitch extraction, and an LPC analysis on the voice minus pitch before we scale the LSPs, scale the pitch delay and recreate the speech: %Determine the pitch with a 1-tap LTP [Beta, tapA] = ltp(sp); |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling