IM2.AP ( Audio processing )
Audio processing
|
|
||
The IM2 domain is characterised by audio signals captured by lapel and headset microphones, microphone arrays and binaural recordings, when it is frequently non-trivial to identify which speaker or speakers are speaking at a particular time. Automatic Speech Recognition (ASR) is difficult in the meeting environment: beyond the issues arising from far-field microphones and multiple sound sources, speech in meetings is conversational, characterised by phenomena such as disfluencies and incomplete utterances. There are additional challenges arising from a high proportion of accented speech from non-native speakers and
a multilingual orientation. Finally, we are concerned with the automatic extraction of metadata, such as speaker identity and ``punctuation'' information.
The major objectives for Phase II of IM2.AP are summarised as follows:
|
Last modified 2006-02-03 15:36