RoCo Challenge@AAAI 2026 — Onsite Experiments

Tasks, scoring, and updated onsite evaluation rules

IMPORTANT

Onsite Logistics & Deployment

Read Before Onsite Evaluation

Critical Notice: Due to potential network instability or limited internet access at the onsite venue, all teams must be fully prepared for offline execution. Failure to comply may result in inability to run evaluations.
Item Requirement Scope Details
Network constraints Offline-ready deployment All teams Teams must bring their own storage device(s) (portable SSD / USB drive) and copy all source code, model checkpoints, and environment dependencies to the onsite workstation before evaluation starts.
Recommendation: Bring redundant copies of critical files and verify that the offline environment can run end-to-end on the onsite workstation prior to the recorded trials.
Section 1

Task Description

Onsite Evaluation

Task Name Initial State Goal
Task 1 Assembly from Scratch Gears placed besides the planet carrier, pending assembly onto the pins. Assemble three gears correctly onto the corresponding pins on the carrier.
Task 2 Resume from Partial State Two gears already assembled correctly, with one gear placed beside for assembly. Assemble the remaining gear onto the correct pin to complete the assembly.
Task 3 Error Detection and Recovery Three gears assembled, but one of them is larger than the correct size. One gear with correct size is placed beside. Correct the mistake by picking the wrongly assembled (larger) gear and placing it on the table, then assemble the correct gear onto the intended pin.
Note: “Correctly assembled” means the gear is placed on its intended pin/position according to the official evaluation script.
Section 2

Scoring Rules (Updated)

Atomic Event Scoring

Event Definition Points Notes
Pick success (Task 1&2) Grasp & lift a target gear 1 The gripper grasps the gear and lifts it off the table (detached from support).
Place success (Task 1&2) Place gear onto intended pin 2 Correct placement onto the target pin meeting the required pose/fit.
Pick wrong gear (Task 3) Pick the wrongly assembled (larger) gear 2 Grasp the wrong gear and lift it until detached from the pin. Picking the wrong gear is rewarded higher in Task 3 to emphasize error removal.
Place wrong gear to table (Task 3) Place the removed wrong gear on the table 1 After removal, the wrong gear must be placed on the table to complete the recovery step.
No double counting First success only per event Repeated attempts on the same gear/event are not double-counted; only the first success is scored.
Final score Weighted aggregation Final Score = (4*S1 + 2*S2 + 4*S3) / 10 (consistent with the simulation setting).
Note1: S1 / S2 / S3 denote task scores computed by the official evaluation scripts under the updated event scoring. Note2: Picking of the right gear in Task 3 will score points only if the preceding removal of the wrong gear is successful.
Section 3

Evaluation Procedure (Updated)

Time Budget & Trials

Rule Setting Value Explanation
Total team time Per team (onsite) 35 min Each team has 35 minutes total onsite time: 5 minutes for initialization and 30 minutes for performance testing.
Recorded results Per task 4 results For each task, the committee records 4 test results. Teams may attempt more runs, but only highest 4 results will be recorded.
Per-task time cap Task 1 / 2 / 3 12 / 6 / 12 min Time caps per task: Task 1 = 12 min, Task 2 = 6 min, Task 3 = 12 min. If fewer than 4 recorded tests are completed within the time cap, the remaining missing results are recorded as 0.
Important: The initialization window (5 min) is intended for environment setup and safety checks only. Teams should plan to start evaluation runs promptly during the 30-minute testing window.