Tasks, scoring, and updated onsite evaluation rules
Read Before Onsite Evaluation
| Item | Requirement | Scope | Details |
|---|---|---|---|
| Network constraints | Offline-ready deployment | All teams | Teams must bring their own storage device(s) (portable SSD / USB drive) and copy all source code, model checkpoints, and environment dependencies to the onsite workstation before evaluation starts. |
Onsite Evaluation
| Task | Name | Initial State | Goal |
|---|---|---|---|
| Task 1 | Assembly from Scratch | Gears placed besides the planet carrier, pending assembly onto the pins. | Assemble three gears correctly onto the corresponding pins on the carrier. |
| Task 2 | Resume from Partial State | Two gears already assembled correctly, with one gear placed beside for assembly. | Assemble the remaining gear onto the correct pin to complete the assembly. |
| Task 3 | Error Detection and Recovery | Three gears assembled, but one of them is larger than the correct size. One gear with correct size is placed beside. | Correct the mistake by picking the wrongly assembled (larger) gear and placing it on the table, then assemble the correct gear onto the intended pin. |
Atomic Event Scoring
| Event | Definition | Points | Notes |
|---|---|---|---|
| Pick success (Task 1&2) | Grasp & lift a target gear | 1 | The gripper grasps the gear and lifts it off the table (detached from support). |
| Place success (Task 1&2) | Place gear onto intended pin | 2 | Correct placement onto the target pin meeting the required pose/fit. |
| Pick wrong gear (Task 3) | Pick the wrongly assembled (larger) gear | 2 | Grasp the wrong gear and lift it until detached from the pin. Picking the wrong gear is rewarded higher in Task 3 to emphasize error removal. |
| Place wrong gear to table (Task 3) | Place the removed wrong gear on the table | 1 | After removal, the wrong gear must be placed on the table to complete the recovery step. |
| No double counting | First success only per event | — | Repeated attempts on the same gear/event are not double-counted; only the first success is scored. |
| Final score | Weighted aggregation | — | Final Score = (4*S1 + 2*S2 + 4*S3) / 10 (consistent with the simulation setting). |
Time Budget & Trials
| Rule | Setting | Value | Explanation |
|---|---|---|---|
| Total team time | Per team (onsite) | 35 min | Each team has 35 minutes total onsite time: 5 minutes for initialization and 30 minutes for performance testing. |
| Recorded results | Per task | 4 results | For each task, the committee records 4 test results. Teams may attempt more runs, but only highest 4 results will be recorded. |
| Per-task time cap | Task 1 / 2 / 3 | 12 / 6 / 12 min | Time caps per task: Task 1 = 12 min, Task 2 = 6 min, Task 3 = 12 min. If fewer than 4 recorded tests are completed within the time cap, the remaining missing results are recorded as 0. |