Task definitions, data specification, and sensor setup for offline evaluation
The onsite experiments consist of three tasks derived from a planetary-gear assembly workflow. Tasks are designed to evaluate manipulation accuracy, partial-progress continuation, and error recovery under realistic deployment conditions.
| Task | Goal | Notes |
|---|---|---|
| Task 1 | Install three planetary gears onto the carrier pins | Each planetary gear must be picked and mounted onto the correct carrier pin with correct placement and stability. |
| Task 2 | Continue from partial progress of Task 1 (two gears already installed) | Task 2 is not provided as an independent dataset. It reuses Task 1 data format, but starts from an intermediate state where two gears are already assembled and the policy must install the remaining gear. |
| Task 3 | Error detection and recovery: remove an incorrectly installed larger gear and replace it with the correct smaller gear | Task 3 targets recovery behavior: the robot must identify the wrong assembly, remove the incorrect part, and complete the correct replacement. |
Each trajectory provides synchronized multi-modal observations and actions for a dual-arm mobile-manipulation platform. The dataset includes robot state/action streams, end-effector feedback, and multi-camera visual observations.
Visual observations are recorded from head and wrist cameras. The head provides stereo RGB streams (left/right), and each wrist provides RGB plus a depth stream.
| Camera | Modality | Details |
|---|---|---|
| Head (Stereo Left) | RGB | Left head camera RGB stream. |
| Head (Stereo Right) | RGB | Right head camera RGB stream. |
| Wrist (Left) | RGB + Depth | Left wrist camera provides RGB and depth observations. |
| Wrist (Right) | RGB + Depth | Right wrist camera provides RGB and depth observations. |
The head cameras do not provide native depth. If depth is required, it must be reconstructed from the stereo RGB pair using camera intrinsics/extrinsics.
stereo.yaml