Applied Speech and Audio Processing: With matlab examples
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
Advanced topics
contraction of pitch periods should be accomplished with as little damage to the pitch pulse shape as possible. There are two ways of achieving this in general. The first, time domain method, is to detect the pitch periods, and scale the waveform shape in a way which is sensitive to the importance of pitch. The second is to completely separate the pitch signal from the speech, and then scale the pitch as required whilst either leaving the remaining speech components untouched, or scaling them in a different way. We will explore two examples: one of a pitch-synchronous time domain scaling, and the other of an LPC-based speech decomposition method. Both methods result in reasonable quality scaled speech that, unless the scaling ratios are very large, can be convincingly natural. 7.9.1 PSOLA The primary traditional method for pitch scaling in audio is known as PSOLA (Pitch Synchronous Overlap and Add). This algorithm lives up to its pitch-synchronous name by first determining a fundamental pitch period. It then segments audio into frames of twice that size, windows them and reassembles the frames using an overlap-add method at a different rate (see Section 2.4 for a discussion on segmentation and overlap) [28]. The different rate of reassembly could either be faster or slower than the original, but as with most such techniques, extreme adjustments can cause significant quality degradation. Figure 7.10 demonstrates the process of speeding up a recording of speech. In this figure, a period of input speech (top waveform) is analysed to determine its fundamental pitch period, M. The speech is then segmented into frames of size 2M with a 50% overlap, ideally centred on the pitch pulse. Each frame is windowed (see Section 2.4.2), and the resultant audio is ‘stitched together’ at a different rate. In this case, the pitch pulses are more frequent, thus increasing the pitch rate of the resulting audio. Matlab code to demonstrate the effectiveness of the PSOLA algorithm is provided below. This relies upon the function ltp() for the pitch extraction method, which we developed in Section 5.3.2.1 to perform long-term prediction (LTP). Within the code, a Hamming window is applied to frames of size 2M . The array indexing, using variables fr1, to1, fr2 and to2 to denote the start and end indices of each array, is the heart of the method. This can be applied to a short recording of a couple of words of speech. In this case, scaling of 0.7 will very clearly speed up the speech, and a scaling of perhaps 1.4 will slow down the speech. Note that the intelligibility of the speech remains clear, although the characteristics of the speaker’s voice will change. %Determine the pitch with a 1-tap LTP [B, M] = ltp(sp); %Scaling ratio sc=0.35; |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling