| Method / Task | Robosuite | MimicGen | Average | ||||
|---|---|---|---|---|---|---|---|
| Lift | Stack | Can | Square | Three Piece Assembly | Stack Three | ||
| DP3 | 88.7±4.2 |
72.0±2.0 |
64.7±1.2 |
36.7±1.2 |
35.3±6.4 |
20.0±3.5 |
52.9 |
| DiT-Policy | 90.7±4.2 |
68.7±7.6 |
64.7±3.1 |
34.7±2.3 |
37.3±7.6 |
18.7±5.0 |
52.5 |
| FreqPolicy | 89.3±1.2 |
71.3±1.2 |
63.3±2.3 |
36.0±3.5 |
27.3±8.1 |
22.0±4.0 |
51.5 |
| FGO (Ours) | 92.7±3.1 |
79.3±3.1 |
66.0±0.0 |
36.7±3.1 |
39.3±7.0 |
25.3±3.1 |
56.6 |
Table 1: Comparison of success rates (%) on the Robosuite and MimicGen benchmarks. Results are computed across 3 training seeds.
| Method / Task | Adroit | DexArt | Average | |||||
|---|---|---|---|---|---|---|---|---|
| Hammer | Door | Pen | Laptop | Toilet | Faucet | Bucket | ||
| DP3 | 100.0±0.0 |
61.3±7.6 |
46.0±5.3 |
77.3±5.0 |
60.7±4.2 |
21.3±4.2 |
24.7±2.3 |
55.9 |
| DiT-Policy | 100.0±0.0 |
63.3±7.6 |
52.0±2.0 |
75.3±3.1 |
63.3±5.0 |
20.7±3.1 |
19.3±3.1 |
56.3 |
| FreqPolicy | 98.7±1.2 |
68.0±3.5 |
52.0±3.5 |
78.0±8.0 |
58.7±4.6 |
20.7±5.0 |
18.7±3.1 |
56.4 |
| FGO (Ours) | 100.0±0.0 |
69.3±2.3 |
55.3±1.2 |
81.3±6.4 |
66.7±1.2 |
24.0±3.5 |
25.3±2.3 |
60.3 |
Table 2: Comparison of success rates (%) on the Adroit and DexArt benchmarks. Results are computed across 3 training seeds.
| Method | ATV ↓ (×10-3 rad/s) | JerkRMS ↓ (rad/s3) | Training Time ↓ (GPU h) | Inference Speed ↓ (ms) |
|---|---|---|---|---|
| DP3 |
14.83±0.17
|
50.87±1.27
|
0.47
|
39.49
|
| DiT-Policy |
14.84±0.22
|
51.01±1.16
|
0.42
|
17.20
|
| FreqPolicy |
15.25±0.39
|
46.91±1.58
|
0.35
|
33.49
|
| FGO (Ours) |
14.76±0.17
|
40.79±0.46
|
0.48
|
44.22
|
Table 3: Comparison of Action Total Variation (ATV), JerkRMS, training time, and inference speed.
@article{wang2026fgo,
author = {Wang, Junlin},
title = {Frequency-Guided Action Diffusion via Sub-Frequency Manifold Traversal},
journal = {arXiv preprint},
year = {2026},
}