Deep Video Understanding Grand Challenge at ACM Multimedia 2022 Call for Papers

Call For Participation

Deep Video Understanding Grand Challenge at ACM Multimedia 2022

Challenge Website:

Deep video understanding is a difficult task which requires systems to
develop a deep analysis and understanding of the relationships between
different entities in video, to use known information to reason about
other, more hidden information, and to populate a knowledge graph (KG)
representation with all acquired information. To work on this task, a
system should take into consideration all available modalities
(speech, image/video, and in some cases text).

The aim of this challenge series is to push the limits of multimodal
extraction, fusion, and analysis techniques to address the problem of
analyzing long duration videos holistically and extracting useful
knowledge to utilize it in solving different types of queries. The
target knowledge includes both visual and non-visual elements. As
videos and multimedia data are getting more and more popular and
usable by users in different domains and contexts, the research,
approaches and techniques we aim to be applied in this Grand Challenge
will be very relevant in the coming years and near future.

Challenge Overview:

Interested participants are invited to apply their approaches and
methods on an extended novel Deep Video Understanding (DVU) dataset
being made available by the challenge organizers. The dataset is split
into a development data of 14 movies from the 2020-2021 versions of
this challenge with Creative Commons licenses, and a new set of 10
movies licensed from KinoLorberEdu platform. 4 new movies out of the
10 will be added to the 14 movies, while 6 will be chosen as the
testing data in 2022. The development data includes: original while
videos, segmented scene shots, image examples of main characters and
locations, movie-level KG representation of the relationships between
main characters, relationships between characters key-locations,
scene-level KG representation of each scene in a movie (location type,
characters, interactions between them, order of interactions,
sentiment of scene, and a short textual summary), and a global shared
ontology of locations, relationships (family, social, work),
interactions and sentiments.

The organizers will support evaluation and scoring for a hybrid of
main query types, at the overall movie level and at the individual
scene level distributed with the dataset. Participants will be given
the choice to submit results for either the movie-level or scene-level
queries, or both. And for each category, queries are grouped for more
flexible submission options :

More details are here on queries and dataset:

Example Question types at Overall Movie Level:

    Multiple choice question answering on part of Knowledge Graph for selected movies.
    Possible path analysis between persons / entities of interest in a Knowledge Graph extracted from selected movies.
    Fill in the Graph Space - Given a partial graph, systems will be asked to fill in the graph space.

Example Question types at Individual Scene Level:

    Find next or previous interaction, given two people, a specific scene, and the interaction between them.
    Find a unique scene given a set of interactions and a scene list.
    Fill in the Graph Space - Given a partial graph for a scene, systems will be asked to fill in the graph space.
    Match between selected scenes and set of scene descriptions written in natural language .
    Scene sentiment classification.

A new addition to 2022 challenge is that systems will be asked to
submit with their results for some queries a temporal segment from the
movie or scene (e.g. using starting/ending timestamps) to act as an
evidence for their answers. This requirement will be evaluated
independently from the main scoring method and its objective is to
demonstrate if systems can explain their results and if they are
submitting their answers for the correct reasons.

Important Dates:

    DVU development data release: Available from This URL

    Testing dataset release : TBD (Coming Soon)

    Testing queries release: TBD

    Run submissions due to organizers: TBD

    Paper submission deadline: TBD

    Results released back to participants: TBD

    Notification to authors: TBD

    camera-ready submission: July 24th, 2022

    ACM Multimedia dates: October 10 - 14, 2022

We hope you can join the challenge. For any questions please email the
organizers directly:

Best Wishes
The DVU Grand Challenge Team