Applied Speech and Audio Processing: With matlab examples
Psychoacoustic modelling
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
- Bu sahifa navigatsiya:
- Advanced topics
7.1. Psychoacoustic modelling
165 b0=f2bark(0); %Bark frequency of 0Hz n=128; %Size of spectrum being analysed for bi=1:40 %Assuming a Bark resolution of 40 bands bark=b0+bi*(b4k-b0)/40; wm=round(n*bark2f(bark)/(4000)); if (wm==0) wm=1; end %establish limits w_low=bark2f(bark - 1.3)*2*pi; w_hig=bark2f(bark + 2.5)*2*pi; wl=fix(w_low/(4000*2*pi/n)); wh=fix(w_hig/(4000*2*pi/n)); %clip to index size if(wl<1) wl=1; end if(wh>n) wh=n; end %perform summation for wi=wl:wh w=wi*2*pi*4000/n; %Find the value of pi (from -1.3 to 2.5) vlu= 6*log( (w/c) + ((w/c)ˆ2 + 1)ˆ0.5); vlu=vlu-bark; %Look at pi & get multiplier mul=0; if(vlu<-1.3) mul=0; else if(vlu<=-0.5) mul=10ˆ(2.5*(vlu+0.5)); else if(vlu<0.5) mul=1; else if(vlu<=2.5) mul=10ˆ(0.5-vlu); end end end end X(bi)=X(bi)+Eql(wm)*mul*p(wi); 166 Advanced topics end end 7.1.5 Intensity-loudness conversion One final step remains, and that is to make an intensity-loudness conversion to relate the almost arbitrary units of the model to a perceived loudness scale, based on the power law of hearing. Whilst this is required for completeness, it is not needed where the model is used to compare two signals directly and the absolute difference is not required. p () = {E(ω) [(ω)]} 0.33 . (7.5) 7.1.6 Masking effect of speech Most computational masking models have been derived for a situation where a single loud tone masks a quieter tone. Some models have been developed beyond that: for cases of noise masking tones, or even tones masking noise. Unfortunately though, evidence to suggest that the models are accurate for any and every combination of sounds is weak. Despite this the models have been applied, apparently successfully, to generalised audio, most notably in MP3 players. A method of application in such generalised scenarios is shown in Figure 7.2, where a particular sound spectrum is analysed across a set of critical band regions. The sound falling within each critical band is totalled, and its masking contribution within that band determined. Then, for each band, the effect of masking spread from immediate neighbouring bands is factored in, assuming it is additive to the masking originating from the band itself. It is unusual to consider masking spread from bands beyond the immediate neighbours. The result of such an analysis is that each critical band will have an individual masking level. Sounds within that band that are above the masking level will be audible, and sounds below the masking level will not be audible. For application to speech, one relatively useful method of determining audibility is to look at the formant frequencies, and consider these as independent tones, most important being formants F2 and F3 which contribute more to intelligibility than the others (see Section 3.2.4). For segments of speech with no formants, the audibility of the overall spectrum (perhaps within the range 250 Hz to 2 kHz) can be determined. If the overall spectrum, or F2 and F3, are inaudible, then it is likely that the speech itself will be unintelligible. It is possible, however, that elements of the speech, whilst unintelligible, will be audible. This underlines the fact that the ability of the human brain and hearing system, to extract signal from noise, is quite awesome at times. |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling