Skip to content

Personal tools
You are here: Home » Research » Projects » Phase II » IM2.AP ( Audio processing )

IM2.AP ( Audio processing )

Document Actions

Audio processing


IP Head: John Dines (IDIAP
The IM2 domain is characterised by audio signals captured by lapel and headset microphones, microphone arrays and binaural recordings, when it is frequently non-trivial to identify which speaker or speakers are speaking at a particular time. Automatic Speech Recognition (ASR) is difficult in the meeting environment: beyond the issues arising from far-field microphones and multiple sound sources, speech in meetings is conversational, characterised by phenomena such as disfluencies and incomplete utterances. There are additional challenges arising from a high proportion of accented speech from non-native speakers and a multilingual orientation. Finally, we are concerned with the automatic extraction of metadata, such as speaker identity and ``punctuation'' information.

The major objectives for Phase II of IM2.AP are summarised as follows:
  • The continued advancement of fundamental techniques in audio processing and their application in applied ASR (especially meeting) environments
  • Encouraging greater intra- and inter- disciplinary cooperation in making advancements to the state-of-the-art
  • Investigation and rigourous 'proof-of-concept' of new paradigms in audio processing, in particular, for speech recognition and speaker recognition tasks
  • Evaluation on more realistic and complex tasks

Last modified 2006-02-03 15:36

Powered by Plone