ART predicts the geometry, texture, articulation structure of each part in the articulated object from image inputs.
Based on the part-based reconstruction, ART's reconstructed object can be directly transformed to URDF format and exported into existing simulators, enabling further embodied AI applications, e.g. humanoid interactions.
We introduce ART, Articulated Reconstruction Transformer—a category-agnostic, feed-forward model that reconstructs complete 3D articulated objects from only sparse, multi-state RGB images. Previous methods for articulated object reconstruction either rely on slow optimization with fragile cross-state correspondences or use feed-forward models limited to specific object categories. In contrast, ART treats articulated objects as assemblies of rigid parts, formulating reconstruction as a part-based prediction problem. Our newly designed transformer architecture maps sparse image inputs to a set of learnable part slots, from which ART jointly decodes unified representations for individual parts, including their 3D geometry, texture, and explicit articulation parameters. The resulting reconstructions are physically interpretable and readily exportable to standard simulation formats. Trained on a large-scale, diverse dataset with per-part supervision, and evaluated across diverse benchmarks, ART achieves significant improvements over existing baselines and establishes a new state of the art for articulated object reconstruction from image inputs.
ART overview. Multi-view, multi-state image inputs with known camera poses are tokenized and processed by a transformer alongside learnable part slot tokens. Two separate decoders then predict each part's geometry, texture and the articulation structure. These components can be further composed and rendered to construct the articulated object at different states.
@article{li2025art,
title = {ART: Articulated Reconstruction Transformer},
author = {Li, Zizhang and Zhang, Cheng and Li, Zhengqin and Howard-Jenkins, Henry and Lv, Zhaoyang and Geng, Chen and Wu, Jiajun and Newcombe, Richard and Engel, Jakob and Dong, Zhao},
journal = {arXiv preprint arXiv:2512.14671},
year = {2025}
}