Applied Speech and Audio Processing: With matlab examples

bet	46/170
Sana	18.10.2023
Hajmi	2,66 Mb.
	#1708320

1 ... 42 43 44 45 46 47 48 49 ... 170

Bog'liq
Applied Speech and Audio Processing With MATLAB Examples ( PDFDrive )

3.3. Speech understanding
51
In terms of implementation, and performing such analysis in Matlab, we note ﬁrstly
that performing an analysis from
−∞ to +∞ is an unrealistic expectation, so we would
normally choose a segment of N samples of audio to analyse over, then window it and
perform a fast Fourier transform to obtain both power spectra, P and S. In discrete
sampled versions, we follow the same method that we used in Chapter 2 to visualise
signals in the frequency domain:
S=fft(s.*hamming(N));
S=20*log10(abs(S(1:N/2)));
P=fft(p.*hamming(N));
P=20*log10(abs(P(1:N/2)));
and then proceed with the SD measure:
SD=mean((S-P).ˆ2);
Indeed, SD is a perceptually relevant difference measure for speech and audio, however
it can be enhanced further, and that is by the additional step of A-weighting the spectra
– so that differences in frequency regions that are more audible are weighted more
than those in frequency regions that are inaudible. This yields a perceptually-weighted
spectral distortion, and is used in practical systems that perform high-quality speech and
audio signal analysis.
3.3.3
Measurement of speech intelligibility
Intelligibility is also best measured by a panel of listeners, and relates to the ability
of listeners to correctly identify words, phrases or sentences. An articulation test is
similar, but applies to the understanding of individual phonemes (vowels or consonants)
in monosyllabic or polysyllabic real or artiﬁcial words. Several common methods of
evaluation exist but those standardised by ANSI (in standard S2.3-1989) dominate. Some
example evaluative procedures are listed here along with references that provide more
information (unless noted, see [16] for further details):
• diagnostic rhyme test (DRT) [17] – asking listeners to distinguish between two words
rhyming by initial, such as {freak, leak};
• modiﬁed rhyme test (MRT) – asking listeners to select one of six words, half differing
by initial and half by ﬁnal, such as {cap, tap, rap, cat, tan, rat};
• phonetically balanced word lists – presenting listeners with 50 sentences of 20 words
each, and asking them to write down the words they hear;
• diagnostic medial consonant test;
• diagnostic alliteration test;
• ICAO spelling alphabet test;
• two-alternative forced choice [18] – a general test category that includes the DRT;
• six-alternative rhyme test [18] – a general test category that includes the MRT;

52
Speech
• four-alternative auditory feature test [17] – asking listeners to select one of four words,
chosen to highlight the intelligibility of the given auditory feature;
• consonant-vowel-consonant test [19,20,11] – test of vowel syllable sandwiched be-
tween two identical consonants, with the recognition of the vowel being the listeners’
task. For example {tAt}, {bOb};
• general sentence test [11] – similar to the phonetically balanced word list test, but
using self-selected sentences that may be more realistic in content (and in context of
what the test is trying to determine);
• general word test [5] – asking listeners to write down each of a set of (usually 100)
spoken words, possibly containing realistic words.
Clearly intelligibility may be tested in terms of phonemes, syllables, words, phrases,
sentences, paragraph meaning, and any other arbitrarily grouped, measured recognition
rate. In general we can say that the smaller the unit tested, the more able we are to
relate the results to individual parts of speech. Unfortunately no reliable method has so
far been developed of extrapolating from, for example the results of a phoneme test,
to determine the effectiveness on sentence recognition (although if you know what the
cause of intelligibility loss is in a particular system, you could have a good guess).
3.3.4
Contextual information, redundancy and vocabulary size
Everyday experience indicates that contextual information plays an important role
in the understanding of speech, often compensating for an extreme lack of original
information. For example the sentence:
‘He likes to xxxxx brandy’
can easily be understood through guessing even though a complete word is missing
(‘drink’).
The construction of sentences is such that the importance of missing words is very
difﬁcult to predict. It is hard to know in advance whether the start, middle or end of a
sentence will be more critical to its understanding. For example the missing word ‘stop’
differs in both importance and predictability in the two sentences:
‘She waited in the long queue at the bus xxxx’
and
‘As the car sped towards him he shouted xxxx!’
Contextual information may be regarded as being provided by surrounding words which
constrain the choice of the enclosed word, or on a smaller scale, by the surrounding
syllables which constrain the choice of a missing or obscured syllable (as certain com-
binations do not appear at all, or very infrequently in the English language). Vocabulary
size reduction also causes a similar constraint, and it is noticeable that most people will
restrict their vocabulary to simple words when communications are impaired: eloquence
is uncommon in highly noisy environments.

Download 2,66 Mb.

Do'stlaringiz bilan baham:

1 ... 42 43 44 45 46 47 48 49 ... 170