Robot dataset quality

Robots Need Data Quality & Diversity.

We provide golden sets, tested annotators, hidden quality checks, diversity checks, weighted consensus, and policy training proof.

12datasets evaluating
61,080+clips reviewed
10,003hours reviewed

The quality pyramid

Raw video is cheap. Trusted signal is scarce.

A robot clip only helps training when the action is visible, the object is right, and the judgment comes from annotators who keep passing hidden checks.

Scarce + high valueCleaner signal compounds upward
Level 6
Model training resultspolicies get better
Outcome
Level 5
Weighted consensusclip + dataset score
Highest
Level 4
Reliability weightsgolden score over time
Trusted
Level 3
Failure reasonsblur, occlusion, wrong object
Explainable
Level 2
Good / bad judgmentshand-object usefulness
Useful
Level 1
Raw egocentric clipsabundant and noisy
Common

Common + noisy at the base. EgoArena moves the useful clips up.

How EgoArena scores

Every label carries a quality score.

01

Test annotators first.

Reviewers pass a short golden test before touching silver data.

02

Keep testing quietly.

Hidden golden clips inside earn mode catch drift over time.

03

Weight the consensus.

Silver clips complete when enough trusted reviewers agree.

04

Train the policies.

We do real training of models and rollouts to gather data signal by getting it to work.

Live arena

Rank datasets by usable signal.

#1
EgoDex
hand-object interaction leaderboard
Live
#2
EgoVerse
quality scored from human review
Live
#3
HA-Ego
raw egocentric clips under audit
Live

For dataset builders

Find the clips worth training on.

Submit raw egocentric data. EgoArena turns noisy clips into ranked, reviewable signal.

Submit dataset