Call for Participants - Robotic Vision Scene Understanding (RVSU) Challenge

Call for Participants - Robotic Vision Scene Understanding (RVSU) Challenge

Dear Researchers,

This is a call for participants for the latest 
ACRV robotic vision scene understanding (RVSU) challenge.

This challenge is being run as one of multiple embodied AI challenges
in the CVPR2021 Embodied AI Workshop.

Eval AI Challenge Link:

Challenge Overview Webpage:

Workshop Webpage:

Deadline: May 7th

Prizes: Total of $2500 USD, 2 Titan RTX GPUs and up to 10 Jetson Nano
GPUs to be distributed (details below)

Challenge Overview

The Robotic Vision Scene Understanding Challenge evaluates how well a
robotic vision system can understand the semantic and geometric
aspects of its environment. The challenge consists of two distinct
tasks: Object-based Semantic SLAM, and Scene Change Detection.

Key features of this challenge include:

    BenchBot, a complete software stack for running semantic scene
    understanding algorithms.

    Running algorithms in realistic 3D simulation, and on real robots,
    with only a few lines of Python code.

    Tiered difficulty levels to allow for easy of entry to robotic
    vision with embodied agents and enable ablation studies.

    The BenchBot API, which allows simple interfacing with robots and
    supports OpenAI Gym-style approaches and a simple object-oriented
    Agent approach.

    Easy-to-use scripts for running simulated environments, executing
    code on a simulated robot, evaluating semantic scene understanding
    results, and automating code execution across multiple

    Opportunities for the best teams to execute their code on a real
    robot in our lab, which uses the same API as the simulated robot.

    Use of the Nvidia Isaac SDK for interfacing with, and simulation
    of, high fidelity 3D environments.

Object-based Semantic SLAM: Participants use a robot to traverse
around the environment, building up an object-based semantic map from
the robotís RGBD sensor observations and odometry measurements.

Scene Change Detection: Participants use a robot to traverse through
an environment scene, building up a semantic understanding of the
scene. Then the robot is moved to a new start position in the same
environment, but with different conditions. Along with a possible
change from day to night, the new scene has a number objects added and
/ or removed. Participants must produce an object-based semantic map
describing the changes between the two scenes.

Difficulty Levels: We provide three difficulty levels of increasing
complexity and similarity to true active robotic vision systems. At
the simplest difficulty level (PGT), the robot moves to pre-defined
poses to collect data and provides ground-truth poses, removing the
need for active exploration and localization . The second level (AGT)
requires active exploration and robot control but still provides
ground-truth pose to remove localization requirements. The final mode
(ADR) is the same as the previous but provides only noisy odometry
information, requiring localization to be calculated by the system.

Prizes: As the challenge is complex, with multiple components, we
provide a tiered prize list. The highest scoring on any given
leaderboard will be awarded the corresponding prize. Teams are allowed
to participate across all challenges and win multiple prizes.

    Scene Change Detection (ADR) - $900 USD, 1 Titan RTX GPU, up to 5 Jetson Nano GPUs
    Semantic SLAM (ADR) - $800 USD, 1 Titan RTX GPU, up to 5 Jetson Nano GPUs
    Semantic SLAM (AGT) - $500 USD
    Semantic SLAM (PGT) - $300 USD

Other Information

Contact Details
Twitter: @robVisChallenge

Partners and embodied AI challenges at CVPR 2021:


    iGibson Challenge 2021, hosted by Stanford Vision and Learning Lab
    and Robotics at Google

    Habitat Challenge 2021, hosted by Facebook AI Research (FAIR) and
    Georgia Tech (

    Navigation and Rearrangement in AI2-THOR, hosted by the Allen
    Institute for AI (

    ALFRED: Interpreting Grounded Instructions for Everyday Tasks,
    hosted by the University of Washington, Carnegie Mellon
    University, the Allen Institute for AI, and the University of
    Southern California (

    Room-Across-Room Habitat Challenge (RxR-Habitat), hosted by Oregon
    State University, Google, and Facebook AI

    SoundSpaces Challenge, hosted by the University of Texas at Austin
    and the University of Illinois at Urbana-Champaign

    TDW-Transport, hosted by the Massachusetts Institute of Technology

    Robotic Vision Scene Understanding, hosted by the Australian
    Centre for Robotic Vision in association with the Queensland
    University of Technology Centre for Robotics

    MultiON: Multi-Object Navigation, hosted by the Indian Institute
    of Technology Kanpur, the University of Illinois at
    Urbana-Champaign, and Simon Fraser University