Automatic Labeling of Visual Data Based on Human Actions, Objects and Their Relationships

Summary

Today, we live in a world where breaking news are almost instantly captured by the crowd and shared on social networks such as Twitter, Instagram or TikTok. Yet, media companies are currently unable to effectively extract meaningful information from the deluge of visual data uploaded online every second. The VITA laboratory is developing such technologies that were used, for example, in 2019 to estimate the size of the crowd during the 14 June women’s strike in Geneva, using images and videos captured by demonstrators. The resulting article, published by Heidi.news, even questioned the official estimates.

In this project, the VITA laboratory, in partnership with RTS, aims to develop computationally efficient methods that will go beyond crowd counting, and extract a large set of semantics ranging from a list of objects to human actions and their relationships. The proposed technology will allow complex metadata to be automatically extracted from any image or video. By working in real time on live or crowdsourced content, this information will enable journalists to enhance the quality of their coverage, and to produce new and innovative formats. Content recommendation or content retrieval systems will also greatly benefit from the enriched metadata.

Now, the VITA laboratory has set up a web platform called Movements that allows scientists and engineers to test all newly developed detection methods on images, videos and even in real time using a webcam.

Project presentations

“Automatic Labeling of Visual Data Based on Human Actions, Objects and Their Relationships” project presentation (2022)

“Automatic Labeling of Visual Data Based on Human Actions, Objects and Their Relationships” project presentation (2023)

«
»

Watch on YouTube

Keywords :
image & video classification, recommendation system, content retrieval, deep learning, visual relationships, multi-task learning

Project outputs

Duration	24 months
People involved	Principal investigator Prof. Alexandre Alahi (EPFL) Scientist George Adaimi (EPFL) Kirell Benzi (EPFL) Céline Demonsant (EPFL) Sven Kreiss (EPFL) Intern Duncan Zauss (EPFL) Media expert Léonard Bouchet (RTS) Simone Comte (RTS)
Academic institution	Visual lntelligence for Transportation – VITA (EPFL)
Media partner	RTS – Radio Télévision Suisse (SRG) – Data and Archives (D+A)
Status	This project started in February 2021 and has been completed
Related call for projects	IMI Research Grant (October 2020)