131
Fig.6a. The average normalised values of each of the 26
mel scale coefficients of the
reference data and the new speaker data on the phoneme /I/.
Fig. 6b. Adaptation of an already evolved EFuNN from NZEnglish phoneme /I/ data to a
new accent data on the same phoneme. After a single pass of and additional evolving
(adaptation) of the /I/
EFuNN on the new accent data, 9 out of 10 frames from the new
accent data were correctly recognised (none of the 10 speech frames from the new accent
were recognised before the adaptation took place). The y-axis shows the output activation
value of the adapted /I/ phoneme EFuNN to the new accent data (10 vectors).
7. ECOS and
EFuNNs for on-line, adaptive,
multi-modal
(speech, image, text) information processing
Several methods for multi-modal information processing that involve images (e.g.,
lip movement) to enhance speech recognition have been developed [23,55,73].
Other methods use speech to enhance image recognition. But when the
multimodal -based recognition (or identification) process
has to be performed in a
real time, on-line, adaptive mode, most of the above methods would fail to achieve
satisfactory results. That is because of the speed of the processing needed and a
method of adaptation needed that can deal with
fast adaptation to new data, some
of them presented only for a very short period of time, in a noisy environment.
Here, a brief reference to a framework AVIS for integrated auditory and visual
information processing published in [43 ], is made. In sub-section two the use of
ECOS and EFuNNs for the implementation of AVIS
is discussed and directions
for further implementations are given.
Do'stlaringiz bilan baham: