Applied Speech and Audio Processing: With matlab examples
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
- Bu sahifa navigatsiya:
- 6.1. Analysis toolkit
Audio analysis
Figure 6.8 Deviation, bias and shift LSP analysis features collected for a 16 kHz sampled speech utterance, with waveform plotted on top, extracted from the TIMIT database. In Matlab, this is again easy, assuming the p = 10 we would simply create the reference LSPs using: bar_w=[1:10]*pi/11 With the p lines distributed evenly across the spectrum, if this were transformed into LPC coefficients and their power spectrum calculated, it would be flat. Dev thus deter- mines how close each frame is to this distribution, such that with β = 2, it becomes the Euclidean distance between the actual and comparison distributions. Odd values such as β = 1 or β = 3 attribute a sign to the deviation from ¯ω i , so that a positive measure denotes high-frequency spectral bias, and a negative measure specifies a low-frequency spectral bias. Each of these measures provides useful information regarding the underlying speech signal, and are illustrated when applied to a speech recording from the TIMIT database [2], in Figure 6.8 (the deviation plot is given for β = 2). Shift indicates predominant LSP frequency distribution movements between consec- utive frames. Considering an example of two adjacent frames containing unvoiced and voiced speech, the LSP distributions in the two frames will be low-frequency, and high- frequency biased respectively. We saw a similar effect when comparing the spoken C and R spectra in Section 6.1.4. A large difference between the two frames gives a 6.1. Analysis toolkit 147 large measure value. The shift measure as shown in Figure 6.8, peaks at obvious speech waveform changes, and may thus be advantageous for speech segmentation. Bias indicates frequency trends within the current frame – that is whether the spectrum of the current frame is high-frequency or low-frequency biased. It is similar to the deviation measure which determines how close the LSP distribution of the current frame is to a predetermined comparison distribution. In Figure 6.8 this registers high values for fricatives, indicating the predominance of their high-frequency components. Where the speech is contaminated by noise of a particularly well-defined shape, if the comparison distribution, ¯ω, of Equation (6.7) is set to represent a spectrum of this shape, then the Dev measure may be reasonably insensitive to noise when averaged. In other words analysing the noise itself will produce a zero mean output. We can also use LSP data to estimate the position of spectral peaks within an analysis frame. Peaks are located approximately halfway between pairs of closely spaced lines, with the peak power related to the closeness of the lines. A ranking of the first few most closely-spaced line pairs in terms of narrowness, will generally correspond to the ordering of the corresponding peaks by power. It must be noted, however, that in some cases, especially where three lines are close together, the correspondence is far less predicable. For unvoiced frames, other speech resonances are similarly reflected in the LSP distribution, although the visual correspondence when plotted is far less dramatic than in the case of strongly voiced speech. Download 2.66 Mb. Do'stlaringiz bilan baham: |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling