Robotic Vision Scene Understanding (RVSU) Challenge Call for Papers
Call for Participants - Robotic Vision Scene Understanding (RVSU) Challenge
This is a call for participants for the latest
ACRV robotic vision scene understanding (RVSU) challenge.
This challenge is being run as one of multiple embodied AI challenges
in the CVPR2021 Embodied AI Workshop.
Eval AI Challenge Link: https://eval.ai/web/challenges/challenge-page/1614/overview
Challenge Overview Webpage: http://cvpr2022.roboticvisionchallenge.org/
Embodied AI Workshop Webpage: https://embodied-ai.org/
Deadline: May 31st
Prizes: Total of $2500 USD, 2 NVIDIA RTX 6000 GPUs and up to 10 NVIDIA
Jetson Nano GPUs to be distributed
Register Interest: https://forms.gle/v1q4Lqtu7AzHgrfb7
The Robotic Vision Scene Understanding Challenge evaluates how well a
robotic vision system can understand the semantic and geometric
aspects of its environment. The challenge consists of two distinct
tasks: Object-based Semantic SLAM, and Scene Change Detection.
Key features of this challenge include:
BenchBot, a complete software stack for running semantic scene
Running algorithms in realistic 3D simulation, and on real robots,
with only a few lines of Python code.
Tiered difficulty levels to allow for easy of entry to robotic
vision with embodied agents and enable ablation studies.
The BenchBot API, which allows simple interfacing with robots and
supports OpenAI Gym-style approaches and a simple object-oriented
Easy-to-use scripts for running simulated environments, executing
code on a simulated robot, evaluating semantic scene understanding
results, and automating code execution across multiple
Use of the Nvidia Omniverse Isaac sim for interfacing with, and
simulation of, high fidelity 3D environments.
Object-based Semantic SLAM: Participants use a robot to traverse
around the environment, building up an object-based semantic map from
the robotís RGBD sensor observations and odometry measurements.
Scene Change Detection: Participants use a robot to traverse through
an environment scene, building up a semantic understanding of the
scene. Then the robot is moved to a new start position in the same
environment, but with different conditions. Along with a possible
change from day to night, the new scene has a number objects added and
/ or removed. Participants must produce an object-based semantic map
describing the changes between the two scenes.
Difficulty Levels: We provide three difficulty levels of increasing
complexity and similarity to true active robotic vision systems. At
the simplest difficulty level (PGT), the robot moves to pre-defined
poses to collect data and provides ground-truth poses, removing the
need for active exploration and localization . The second level (AGT)
requires active exploration and robot control but still provides
ground-truth pose to remove localization requirements. The final mode
(ADR) is the same as the previous but provides only noisy odometry
information, requiring localization to be calculated by the system.