Information Indexing & Retrieval
Partners
University of Geneva, Computer Vision and Multimedia Laboratory (IP leader): Stéphane Marchand-Maillet and Thierry Pun, http://vision.unige.ch/
Swiss Federal Institute of Technology, Signal Processing Laboratory: Murat Kunt, http://ltswww.epfl.ch/
IDIAP: Hervé Bourlard, http://www.idiap.ch/
University of Geneva, Translation and Interpretation School: Susan Armstrong, http://www.issco.unige.ch/
Context and Goals
The main goal of IM2.IIR is to provide multimedia data indexing and multimedia/multimodal querying mechanisms that will facilitate subsequent multimedia data retrieval and management. As an example, suppose a user wishes to retrieve video clips from a video database. This user could formulate a multimodal query composed of a sample image and caption roughly corresponding to what s/he is looking for, of a descriptive sentence spoken over a microphone, and of a hand gesture accomplished in front of the user's webcam and that should be in the video clip that is sought. The system should then be able to retrieve video clips matching this description.
No such system exists (yet!). Currently, most content-based information management systems address one medium only (e.g. images, or video, or text). Simultaneously considering different media will augment the performances of the system by using more information, and will also create new capabilities by exploiting the inherent information emerging from the combination of media. It is therefore expected that this system will handle semantics better than the current state of the art. Furthermore, enabling multimodal queries will allow for more human-like interaction with the system, which is one of the fundamental goals of (IM)2.
Research Issues
With respect to existing state-of-the-art, the emphasis of IM2.IIR will be put on:
- multimedia data: how to take into account, and if possible benefit from, the fact that data is not limited to a single medium;
- user interaction: how to benefit from the user actions, ultimately provided in a multimodal manner. These actions might happen at different times: before the first query in order to define it, and during the query refinement process through continuous interaction and by means of mechanisms such as relevance feedback;
- interoperability: how to interconnect the various IIR tools already developed by the partners of this IP. Interoperability has two meanings here: between medias (i.e. between image, text, sound, etc.) as well as between levels (i.e. audio signal, recognized speech, high-level semantic interpretation).
Year 1: establish a common environment based on MRML (Multimedia Retrieval Markup Language) that will encompass the various mono-media information retrieval engines developed in the participating laboratories (speech, text, audio, video). This means: specifying data flows (e.g. data, interaction flow, relevance, low-level features, high-level ontologies); extending MRML to allow for multimodal multimedia information retrieval; defining features extraction protocol and tools; adapting individual IIR engines to the common framework.
Year 2: this phase should lead to the integration of more than one media (e.g. sound and video, text and images) for IIR. Steps taken will include: links with the annotation effort; specifying combined data formats; defining a combined multimedia query format; initiating the learning of user preferences to permit user personalization using relevance feedback; using the data from the meeting room.
Year 3: extension towards other IP's in order to integrate their modules and set up a vertical demonstrator. The following tasks are envisaged: continuing research on user preference learning; allowing multimodal input queries; including device dependencies into the system.
Year 4: This final year of the project will mostly be oriented toward making the system truly multimodal. In particular, we want to enable the use of multimodal queries, making the system flexible to various output devices, and actually use biointeraction.
Links
Software download:
- GIFT (Gnu Image Finding Tool): http://www.gnu.org/software/gift/
- MRML (Multimedia Retrieval Markup Language): http://www.mrml.net/
- Torch (C++ Machine learning library): http://www.torch.ch/
- To see a list of publications on this site click here
- IDIAP: http://www.idiap.ch/
- Swiss Federal Institute of Technology, Signal Processing Laboratory: http://ltswww.epfl.ch/publications.html
- University of Geneva, Computer Vision and Multimedia Laboratory: http://vision.unige.ch/publications/index.html
- University of Geneva, Translation and Interpretation School: Susan Armstrong, http://www.issco.unige.ch/publications/
Quarterly status reports
Available on the local site (password protected).
Last modified 2006-02-03 15:55