Deep Video Understanding Call for Papers

2nd ACM Multimedia Grand Challenge
on Deep Video Understanding
(Oct. 20 - 24, 2021)

Deep video understanding is a difficult task which requires systems to
develop a deep analysis and understanding of the relationships between
different entities in video, to use known information to reason about
other, more hidden information, and to populate a knowledge graph (KG)
with all acquired information. To work on this task, a system should
take into consideration all available modalities (speech, image/video,
and in some cases text). The aim of this new challenge is to push the
limits of multimodal extraction, fusion, and analysis techniques to
address the problem of analyzing long duration videos holistically and
extracting useful knowledge to utilize it in solving different types
of queries. The target knowledge includes both visual and non-visual
elements. As videos and multimedia data are getting more and more
popular and usable by users in different domains, the research,
approaches and techniques we aim to apply in this Grand Challenge will
be very relevant in the coming years and near future.

Challenge Overview:

Interested participants are invited to apply their approaches and
methods on an extended novel Deep Video Understanding (DVU) dataset
being made available by the challenge organizers. This includes the 10
movies from the 2020 version of this challenge (HLVU) with a Creative
Commons license, and has been supplemented with the Land Girls TV
series licensed for us in this challenge by the BBC, and additional
Creative Commons license movies added for the 2021 challenge. The
dataset will be annotated by human assessors and final ground truth,
both at the overall movie level (Ontology of relations, entities,
actions & events, Knowledge Graph, and names and images of all main
characters), and the individual scene level (Ontology of locations,
people/entities, attributes for these and interactions between) will
be provided for 50% of the dataset to participating researchers for
training and development of their systems. The organizers will support
evaluation and scoring for a hybrid of main query types, at the
overall movie level and at the individual scene level distributed with
the dataset (please refer to the dataset webpage for more details):

Example Question types at Overall Movie Level:

1- Multiple choice question answering on part of Knowledge Graph for
selected movies.

2- Possible path analysis between persons / entities of interest in a
Knowledge Graph extracted from selected movies.

3- Fill in the Graph Space - Given a partial graph, systems will be
asked to fill in the graph space.

Example Question types at Individual Scene Level:

1- Find the next or previous interaction, given two people, a specific
scene, and the interaction between them.

2- Find a unique scene given a set of interactions and a scene list.

3- Fill in the Graph Space - Given a partial graph for a scene,
systems will be asked to fill in the graph space.

4- Match between selected scenes and set of scene descriptions written
in natural language

Challenge Website:

Important Dates:

Complete HLVU annotations for development and testing data,
used in 2020, available:

DVU development data release: April 19, 2021
Testing dataset release : May 1, 2021
Testing queries release : June 6, 2021
Run submissions due to organizers: July 11, 2021
Paper submission deadline: July 11, 2021
Results released back to participants: TBD
Notification to authors: TBD
camera-ready submission: TBD
ACM Multimedia dates: October 20 - 24, 2021

Thank You!
Tha DVU2021 Organizers