Robotic Vision Scene Understanding (RVSU) Challenge Call for Papers

Call for Participants - Robotic Vision Scene Understanding (RVSU) Challenge

Dear Researchers,

This is a call for participants for the latest 
ACRV robotic vision scene understanding (RVSU) challenge.

This challenge is being run as one of multiple embodied AI challenges
in the CVPR2021 Embodied AI Workshop.

Eval AI Challenge Link:

Challenge Overview Webpage:

Embodied AI Workshop Webpage:

Deadline: May 31st

Prizes: Total of $2500 USD, 2 NVIDIA RTX 6000 GPUs and up to 10 NVIDIA
Jetson Nano GPUs to be distributed

Register Interest:

Challenge Overview

The Robotic Vision Scene Understanding Challenge evaluates how well a
robotic vision system can understand the semantic and geometric
aspects of its environment. The challenge consists of two distinct
tasks: Object-based Semantic SLAM, and Scene Change Detection.

Key features of this challenge include:

    BenchBot, a complete software stack for running semantic scene
    understanding algorithms.

    Running algorithms in realistic 3D simulation, and on real robots,
    with only a few lines of Python code.

    Tiered difficulty levels to allow for easy of entry to robotic
    vision with embodied agents and enable ablation studies.

    The BenchBot API, which allows simple interfacing with robots and
    supports OpenAI Gym-style approaches and a simple object-oriented
    Agent approach.

    Easy-to-use scripts for running simulated environments, executing
    code on a simulated robot, evaluating semantic scene understanding
    results, and automating code execution across multiple

    Use of the Nvidia Omniverse Isaac sim for interfacing with, and
    simulation of, high fidelity 3D environments.

Object-based Semantic SLAM: Participants use a robot to traverse
around the environment, building up an object-based semantic map from
the robotís RGBD sensor observations and odometry measurements.

Scene Change Detection: Participants use a robot to traverse through
an environment scene, building up a semantic understanding of the
scene. Then the robot is moved to a new start position in the same
environment, but with different conditions. Along with a possible
change from day to night, the new scene has a number objects added and
/ or removed. Participants must produce an object-based semantic map
describing the changes between the two scenes.

Difficulty Levels: We provide three difficulty levels of increasing
complexity and similarity to true active robotic vision systems. At
the simplest difficulty level (PGT), the robot moves to pre-defined
poses to collect data and provides ground-truth poses, removing the
need for active exploration and localization . The second level (AGT)
requires active exploration and robot control but still provides
ground-truth pose to remove localization requirements. The final mode
(ADR) is the same as the previous but provides only noisy odometry
information, requiring localization to be calculated by the system.

Other Information

Contact Details
Twitter: @robVisChallenge