 |
| |
 |
About CASAM |
|
|
|
|
The acronym stands for Computer-Aided Semantic Annotation of Multimedia
Why we need better metadata: To make the most of multimedia, people need to know exactly what is inside videos, audio clips, or photos.
Example: A video entitled “President Obama arrives in Dresden”. Obvious content includes: ‘Air Force One’ landing and Obama taking an armoured limousine to the city centre. But there are more details not mentioned in the caption: footage of Dresden airport along with rainy conditions, Obama leaving with entourage, bystanders watching, motorcade with police escort, etc. There are many details that could influence your decision to choose this video. You may also want to use it for different reasons. Maybe you are a TV producer making a segment on Dresden airport and need footage in certain conditions. Or you need shots of a Dresden street, but without cars on it.
To exploit these details without watching the full-length video, you need searchable annotation (tags, in other words). With tags, you can Google the database of videos and find exactly what you need.
Situation today: This job of annotation is currently performed by humans -- by specialised archivists and librarians, sometimes by journalists, editors or even the general public (social tagging like Delicious or Flickr). But human annotation has its drawbacks: it takes a lot of time and expertise if you want to do it well. This either means high costs or questionable reliability. If a user adds tags, each may have different categories or a variety of spellings for the same thing, making it harder to retrieve.
Need for better processes: There is a wealth of footage in broadcasters’ archives and multimedia databases around the world. But these treasure troves remain hidden because nobody knows they exist, let alone how to search them effectively.
CASAM will develop a remedy for the double problem of expensive manual work and unexploited multimedia resources. The CASAM software will use state-of-the-art computer technology to analyse automatically multimedia sources and provide detailed and systematic sets of descriptions for their content.
Combined power of humans and computers: Technology is not advanced enough to achieve optimal annotation results on its own. Take machine translation, for instance. There is always a lot of uncertainty that a person can resolve immediately but a computer cannot yet understand. So CASAM does not try to be completely autonomous. Rather, it will analyse content as far as it can and then put carefully selected clarification questions to human experts. Continuing the Obama in Dresden example, the Presidential limousine is unusually elongated but with no clear markings. CASAM may ask its user: Is this a specialised vehicle for a high-ranking person?
CASAM provides the most efficient combination of automatic and human annotation, saving time and money. It is the best way to exploit multimedia content for primary and secondary usage: helping editors choose video footage, soundbites, and images for any kind of production and helping freelance journalists and producers to sell their work to media outlets. It can unlock dormant material for re-use in new contexts, whether snippets or entire videos.
While CASAM is meant as a search and retrieve tool for professional broadcasters, producers, and online media, it could one day be rolled out for use by the general public.
CASAM is a semantic tool. It recognises items of content and their relations by looking at the sense they make in combination: When many people use their umbrellas in a given situation, it is safe to assume that it is raining, and the other way round. Where a plane lands it is usually at an airport. A motorcade is used during visits of high-ranking persons such as heads of state. A blue and white wide-body airplane marked “United States of America” can likely be identified as Air Force One. At the time of shooting the footage, Barack Obama was President of the United States. And so on.
CASAM uses all available sources to infer as much information as possible. It uses sound and speech analysis, character recognition, identification of images and faces, etc., and brings them all together. Its flexible structure – or ontology – is based on a set of concepts and their relations to each other (as described in the previous paragraph). This means: while there may be rain during a state visit, not all state visits happen in rainy weather. But the fact that it is raining always means that the venue is outside.
CASAM uses human input on a case-by-case basis to learn by itself and to expand and upgrade its ontology. It adds new concepts (cars can, however rarely, be elongated like limousines) and new relations (since black limousines have been identified in motorcades of the US President, the question “limousine yes or no?” is not needed in future similar contexts), and thus become increasingly “intelligent”.
CASAM comes in three modules:
(1) a knowledge database and analysis tool that looks directly at the content,
(2) a module that processes the information derived from the initial knowledge-based analysis, asks for user input where necessary and integrates that input into the ontology and the annotation, and
(3) the user interface.
Work on CASAM started in April 2008. The CASAM project ended in March 2011.
Researchers have carried out significant work on Knowledge-driven Multimedia Analysis, Reasoning and User Interface Design which has led to a successful project outcome and several publications.
The CASAM Annotation System final prototype has been deployed and evaluated.
The CASAM concept of Human-Machine Synergy for Multimedia Analysis has been proved.
The CASAM project is implemented by a consortium of commercial, academic, and non-profit partners from Greece, the UK, Germany, Portugal and the Netherlands. The group has three research institutes, two software companies, and three media organisations representing the prospective users of the system.
The project is co-funded by the European Union under the Seventh Framework Programme for Research (FP7) and supervised by the European Commission’s Information Society and Media Directorate-General.
|
|
|
 |
|
|
 |
| Events |
 |
|
 |
| |
|
Follow our discussions and news at:
|
|
 |
|