Applied Speech and Audio Processing: With matlab examples
Download 2.66 Mb. Pdf ko'rish
|
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )
4.5. Auditory scene analysis
81 Figure 4.8 Illustration of the feature of common fate by comparing three sections of audio tones. The first section reproduces a pleasing note generated from three related sinewaves of different frequencies. The second section extends each of the three fundamentals of the original sound with their first harmonics. The third section modulates each of the three tones plus their first harmonic. The three modulating frequencies are unrelated. 4.5.3 Common fate When groups of tones or noises start, stop or fluctuate together, they are more likely to be interpreted as being either part of a combined sound, or at least having a common source. As an example, a human voice actually consists of many component tones which are modulated by lung power, vocal tract shape, lip closure and so on, to create speech. It is likely that the ability to cluster sounds together by ‘common fate’ is the mechanism by which we interpret many sounds heard simultaneously as being speech from a single mouth. In noisy environments, or in the case of multi-speaker babble, common fate may well be the dominant mechanism by which we can separate out the sounds of particular individuals who are speaking simultaneously. The same effect, and outcome, can be heard in an orchestra, where several musical instruments, even playing the same fundamental note simultaneously, can be interpreted by the HAS as being played separately (and the more different the instrument is, the more unique is the modulation, and thus the easier they are to distinguish, for example a violin and a bassoon are easier to distinguish than two bassoons playing together). Let us attempt to demonstrate this effect in Matlab by creating three sets of combined audio notes for playback, as illustrated in Figure 4.8. First of all we will define three sinewave frequencies of a, b and c, related by a power of 2 (1/12) so that they sound pleasant (see Infobox 2.5 on page 33 which describes the frequency relationship of mu- sical notes). We will then use the tonegen function of Section 2.7.1 to create three sinewaves with these frequencies, and also three sinewaves of double the frequency: dur=1.2; Fs=8000; a=220; 82 Hearing b=a*2ˆ(3/12); c=a*2ˆ(7/12); sa=tonegen(a, Fs, dur); sb=tonegen(b, Fs, dur); sc=tonegen(c, Fs, dur); sa2=tonegen(a*2, Fs, dur); sb2=tonegen(b*2, Fs, dur); sc2=tonegen(c*2, Fs, dur); Next, two sounds are defined which in turn mix together these notes: sound1=sa+sb+sc; sound2=sound1+sa2+sb2+sc2; Now, three different modulation patterns are created, again using the tonegen func- tion, but at much lower frequencies of 7, 27 and 51 Hz, chosen arbitrarily to not be harmonically related to either the original tones, or to each other. These are then used to modulate the various original tones: mod1=tonegen(7, Fs, dur); mod2=tonegen(27, Fs, dur); mod3=tonegen(51, Fs, dur); am=mod1.*(sa+sa2); bm=mod2.*(sb+sb2); cm=mod3.*(sc+sc2); Finally, a short gap is defined for the third sound to accentuate the common starting point of three sound components, which are combined into sound3, and finally placed into a composite vector for replaying: gap=zeros(1,Fs*0.05); sound3=[am,gap,gap]+[gap,bm,gap]+[gap,gap,cm]; soundsc([sound1,sound2,sound3], Fs) A listener exposed to the sounds created should hear, in turn, three segments of audio lasting approximately 1.2 s each. Firstly a pleasing musical chord consisting of three harmonics is heard. Then this musical chord is augmented with some harmonically related higher frequencies. For both of these segments, the perception is of a single musical chord being produced. However the final segment is rather discordant, and appears to consist of three separate audible components. Because each of the three harmonic notes from the second segment are now modu- lated differently, the brain no longer considers them as being from the same source, but rather from different sources. It thus separates them in its perceptual space. |
Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling
ma'muriyatiga murojaat qiling