TREC Video Retrieval Evaluation (TRECVID 2020) Call for Papers


February 2020 - November 2020

Conducted by the National Institute of Standards and Technology (NIST)
with additional funding from other US government agencies.

I n t r o d u c t i o n:

The TREC Video Retrieval Evaluation series ( promotes
progress in content-based analysis of and retrieval from digital video
via open, metrics-based evaluation. TRECVID is a laboratory-style
evaluation that attempts to model real world situations or significant
component tasks involved in such situations. In its 20th annual evaluation
cycle TRECVID will evaluate participating systems on 6 different video
analysis and retrieval tasks using various types of real world datasets.
Below is the main datasets to be used in 2020 across the 6 proposed tasks.

D a t a:

In TRECVID 2020 NIST will use at least the following data sets:

      * Vimeo Creative Commons Collection (V3C)
      The V3C is a large-scale video dataset that has been collected
      from high-quality web videos with a time span over several years
      in order to represent true videos in the wild. It consists of
      28,450 videos with a duration of 3,801 hours in total.  The
      first part of this dataset (V3C1) has been used by the Video
      Browser Showdown (VBS) 2019 and the Ad-Hoc Video Search (AVS)
      task at TRECVID 2019 as well.  For both campaigns V3C1 will
      serve as a basis over three years (VBS 2019-2021 and TRECVID
      2019-2021). V3C1 contains 1,000 hours of video content and
      approximately one million shots that were created by the authors
      of the dataset using the open-source multimedia retrieval engine
      Cineast. A subset of approx. 2000 clips from the second part
      (V3C2) of the V3C collection will be used as testing data for
      the Video-to-Text (VTT) task in 2020.
      * IACC.3

      The IACC.3 was introduced in 2016 and consists of approximately
      4600 Internet Archive videos (144 GB, 600 h) with Creative
      Commons licenses in MPEG-4/H.264 format with duration ranging
      from 6.5 min to 9.5 min and a mean duration of almost 7.8
      min. Most videos will have some metadata provided by the donor
      available e.g., title, keywords, and description.

      * BBC EastEnders

      Approximately 244 video files (totally 300GB, 464 hours) with
      associated metadata, each containing a week's worth of BBC
      EastEnders programs in MPEG-4/H.264 format.

      * Twitter Vine videos
      Approximately 8,000 6 sec video clips URLs from the public
      Twitter stream of Vine videos have been human annotated by video
      captions from 2016-2019. These Vine videos will be provided as
      development data for participants of the Video-to-Text (VTT)

      * Gatwick and i-LIDS MCT airport surveillance video

      The data consist of about 150 hours obtained from airport
      surveillance video data (courtesy of the UK Home Office). The
      Linguistic Data Consortium has provided event annotations for
      the entire corpus. The corpus was divided into development and
      evaluation subsets. Annotations for 2008 development and test
      sets are available.

      * VIRAT dataset
      The VIRAT Video Dataset is a large-scale surveillance video
      dataset designed to assess the performance of activity detection
      algorithms in realistic scenes.  The dataset was collected to
      facilitate both detection of activities and to localize the
      corresponding spatio-temporal location of objects associated
      with activities from a large continuous video. The VIRAT dataset
      are closely aligned with real-world video surveillance

       * LADI dataset

      The Low Altitude Disaster Imagery (LADI) dataset is hosted as
      part of the AWS Public Dataset program and will be available to
      participants of the DSDI task as development data. It consists
      of over 20,000+ annotated images, each at least 4 MB in
      size. The annotated features were selected based on a
      recommendation from the public safety community. In total there
      are 31 features across 5 categories. The dataset was collected
      between 2015 - 2019 during major natural disaster events
      (e.g. hurricanes, floodings, earthquakes) across several USA
      states. The lower altitude criteria is intended to further
      distinguish the LADI dataset from satellite or "top down"
      datasets and to support development of computer vision
      capabilities for small drones operating at low altitudes. A
      minimum image size was selected to maximize the efficiency of
      the crowd source workers. For more information about LADI,
      please refer to the github organization.

T a s k s:

In TRECVID 2020 NIST will evaluate systems on the following tasks
using the [data] indicated:

    * AVS: Ad-hoc Video Search (automatic, manually-assisted,
    relevance feedback) [V3C1]

      The Ad-hoc search task started in TRECVID 2016 and will continue
      in 2020 to model the end user search use-case, who is looking
      for segments of video containing persons, objects, activities,
      locations, etc., and combinations of the former. Given about 30
      multimedia topics created at NIST, return for each topic all the
      shots which meet the video need expressed by it, ranked in order
      of confidence. Although all evaluated submissions will be for
      automatic runs, Interactive systems will have the opportunity to
      participate in the Video Browser Showdown (VBS) in 2021 using
      the same testing data (V3C1).

    * ActEV: Activities in Extended Video [VIRAT]
      ActEV is a series of evaluations to accelerate development of
      robust, multi-camera, automatic activity detection algorithms
      for forensic and real-time alerting applications.  ActEV is an
      extension of the annual TRECVID Surveillance Event Detection
      (SED) evaluation where systems will also detect, and track
      objects involved in the activities. Each evaluation will
      challenge systems with new data, system requirements, and/or new

    * INS: Instance search (interactive, automatic) [BBB EastEnders] 

      An important need in many situations involving video collections
      (archive video search/reuse, personal video organization/search,
      surveillance, law enforcement, protection of brand/logo use) is to
      find more video segments of a certain specific person, object,
      or place, given a visual example. A new query type started in 2019 
      asking systems to retrieve specific persons doing specific actions. 
      A set of defined actions with various image/video examples will be given 
      and each topic will include few examples (image and video) 
      of a person and ask systems to find that person doing one of the defined actions.

    * VTT: Video to Text Description [Vimeo Creative Commons
    Collections (V3C2)]

      Automatic annotation of videos using natural language text
      descriptions has been a long-standing goal of computer
      vision. The task involves understanding of many concepts such as
      objects, actions, scenes, person-object relations, temporal
      order of events and many others. In recent years there have been
      major advances in computer vision techniques which enabled
      researchers to start practically to work on solving such
      problem.  Given a set of short video clips and number of
      reference sets of text descriptions, systems are asked to work
      and submit results for two subtasks.The core "Description
      Generation" subtask requires systems to automatically generate a
      text description (1 sentence) for each video clip.  An optional
      "Matching and Ranking" subtask requires systems to return for
      each video a ranked list of the most likely text description
      that correspond (was annotated) to the video from each of the
      reference sets.

    * VSUM: Video Summarization [BBC Eastenders Soap Opera] An
      important need in many situations involving video collections
      (archive video search/reuse, personal video organization/search,
      movies, tv shows, etc.) is to summarize the video in order to
      reduce the size and concentrate the amount of high value
      information in the video track.  In 2020 we begin a new video
      summarization track in TRECVID in which the task is to summarize
      the major life events of specific characters over a number of
      weeks of programming on the BBC Eastenders TV series. Typically,
      three characters will be chosen for this task every year, and
      summaries of their major life events must be between the
      selected period of the show, which will be specified to
      participants in advance of the task.

    * DSDI: Disaster Scene Description and Indexing [Low Altitude
    Disaster Imagery (LADI)]

      Computer vision capabilities have rapidly been advancing and are
      expected to become an important component to incident and
      disaster response. However, the majority of computer vision
      capabilities are not meeting public safety needs, such as
      support for search and rescue, due to the lack of appropriate
      training data and requirements. In response, the organizers
      developed a dataset of images collected by the Civil Air Patrol
      of various natural disasters. Two key distinctions are the low
      altitude and oblique perspective of the imagery and
      disaster-related features, which are rarely featured in computer
      vision benchmarks and datasets. This task invites researchers to
      work on this new domain to develop new capabilities and close
      the gap in performance to essentially label short video clips
      with the correct disaster-related feature(s).

In addition to the data, TRECVID will provide uniform scoring
procedures, and a forum for organizations interested in comparing
their approaches and results.

Participants will be encouraged to share resources and intermediate
system outputs to lower entry barriers and enable analysis of various
components' contributions and interactions.

* You are invited to participate in TRECVID 2020 *

The evaluation is defined by the Guidelines. A draft version is
available: and
further feedback input from the participants are welcomed till April,2020.

You should read the guidelines carefully before applying to participate in one or more tasks: 

P l e a s e   n o t e:
1) Dissemination of TRECVID work and results other than in the
(publicly available) conference proceedings is welcomed, but the
conditions of participation specifically preclude any advertising claims based on TRECVID results.

2) All system output and results submitted to NIST are published in
the Proceedings or on the public portions of TRECVID web site archive.

3) The workshop is open only to participating groups that submit
results for at least one task and to selected government personnel
from sponsoring agencies and data donors.

4) Each participating group is required to submit before the November
workshop a notebook paper describing their experiments and results.
This is true even for groups who may not be able to attend the

5) It is the responsibility of each team contact to make sure that
information distributed via the call for participation and the email list is disseminated to all team members with
a need to know. This includes information about deadlines and
restrictions on use of data.

6) By applying to participate you indicate your acceptance of the
above conditions and obligations.

There is a tentative schedule for the tasks included in the Guidelines
webpage: Schedule

W o r k s h o p   f o r m a t

Plans are for a 2 and half days workshop at NIST
in Gaithersburg, Maryland - just outside Washington, DC. Confirmation
and details will be provided to participants as soon as available.

The TRECVID workshop is used as a forum both for presentation of
results (including failure analyses and system comparisons), and for
more lengthy system presentations describing retrieval techniques
used, experiments run using the data, and other issues of interest to
researchers in information retrieval and computer vision. As there is 
a limited amount of time for these presentations, the evaluation coordinators and NIST
will determine which groups are asked to speak and which groups will
present in a poster session. Groups that are interested in having a
speaking slot during the workshop will be asked to submit a short
abstract before the workshop describing the experiments they
performed. Speakers will be selected based on these abstracts.

H o w   t o   r e s p o n d   t o   t h i s   c a l l

Organizations wishing to participate in TRECVID 2020 must respond
to this call for participation by submitting an on-line application by
1 April (the earlier the better).  Only ONE APPLICATION PER TEAM please, regardless of how
many organizations the team comprises.

*PLEASE* only apply if you are able and fully intend to complete the
work for at least one task. Taking the data but not submitting any
runs threatens the continued operation of the workshop and the
availability of data for the entire community.

Here is the application URL:

You will receive an immediate automatic response when your application
is received. NIST will respond with more detail to all applications submitted
before the end of March.  At that point you'll be
given the active participant's userid and password, be subscribed to
the tv20.list email discussion list, and can participate in finalizing
the guidelines as well as sign up to get the data, which is controlled
by separate passwords.

T R E C V I D   2 0 2 0   e m a i l   d i s c u s s i o n   l i s t

The tv20.list email discussion list ( will serve as
the main forum for discussion and for dissemination information about
TRECVID 2020.  It is each participant's responsibility to monitor the
tv20.list postings.  It accepts postings only from the email addresses
used to subscribe to it.  At the bottom of the guidelines there is a
link to an archive of past postings available using the active
participant's userid/password.

Q u e s t i o n s

Any administrative questions about conference participation,
application format/content, subscriptions to the tv20.list,
etc. should be sent to george.awad at

Best regards,

TRECVID 2020 organizers team