REACT3DLogo

Recovering Articulations for Interactive Physical 3D Scenes

1ETH Zürich 2Max Planck Institute for Informatics 3Microsoft


teaser-fig.

We introduce REACT3D, a scalable zero-shot framework that
converts static 3D scenes into simulation-ready interactive replicas with consistent geometry.

Abstract

Interactive 3D scenes are increasingly vital for embodied intelligence, yet existing datasets remain limited due to the labor-intensive process of annotating part segmentation, kinematic types, and motion trajectories. We present REACT3D, a scalable zero-shot framework that converts static 3D scenes into simulation-ready interactive replicas with consistent geometry, enabling direct use in diverse downstream tasks. Our contributions include: (i) openable-object detection and segmentation to extract candidate movable parts from static scenes, (ii) articulation estimation that infers joint types and motion parameters, (iii) hidden-geometry completion followed by interactive object assembly, and (iv) interactive scene integration in widely supported formats to ensure compatibility with standard simulation platforms. We achieve state-of-the-art performance on detection/segmentation and articulation metrics across diverse indoor scenes, demonstrating the effectiveness of our framework and providing a practical foundation for scalable interactive scene generation, thereby lowering the barrier to large-scale research on articulated scene understanding.


Motivation

Embodied AI requires large-scale, physically interactive 3D environments, but building them today hinges on slow, expert-heavy annotations of parts, joints, and motions. Meanwhile, vast static 3D datasets already exist yet remain inert for interaction and simulation. Our motivation is to unlock the latent interactivity of these static assets—recovering articulations and consistent geometry automatically—so that researchers can train and evaluate agents at scale without new data collection or manual labeling. By converting abundant static scenes into reliable interactive environments, we aim to collapse the data bottleneck that currently limits progress in articulated scene understanding and embodied intelligence.

REACT3D

We propose a novel, application-driven framework for converting high-quality static scenes into physics-aware, interactive environments:

  • Methodology: REACT3D detects openable objects, segments movable parts, infers joints and motion, completes hidden geometry, and exports simulation-ready assets.
  • Performance: We achieve state-of-the-art detection/segmentation and articulation metrics across diverse indoor scenes, against strong baselines.
  • Utility: The resulting assets integrate with ROS, Isaac Sim, PyBullet, and Open3D, enabling large-scale training and evaluation without new data or manual labels.


System Overview

teaser-fig.

Given a static 3D scene, our method first applies open-vocabulary detection to identify openable objects and segmentation to extract their movable parts. We then estimate articulations and generate hidden geometry to obtain interactive objects. Finally, they are integrated with the static background to produce a simulation-ready interactive scene.


Interactive Object Generation

Predicted 2D Mask

Step 1

Segmented Mesh

Step 2

Predicted Joint

Step 3

Refined Joint

Step 4

Interactive Object

Step 5

Opened Object

Step 6


From left to right, the figure shows key intermediate results of interactive object generation. In the last column, the thin red line highlights the contour of the base part.


Qualitative Results

Input 1
Input 2
Input 3
Output 1
Output 2
Output 3

We show three example input scenes from ScanNet++ (top row) and their corresponding interactive replicas generated by REACT3D (bottom row), visualized in Isaac Sim. Our method robustly converts diverse object types, shapes, sizes, and articulation mechanisms, producing high-quality interactive scenes.

Manipulating Interactive Scenes in Simulator

We showcase the interactive scenes generated by REACT3D in two widely used simulators, Isaac Sim and ROS. The videos demonstrate the physical interactivity of various objects. Users can manipulate each single object through GUI, highlighting the versatility and applicability of our generated scenes in different simulation environments.


Isaac Sim


ROS

BibTeX

@misc{huang2025react3drecoveringarticulationsinteractive,
      title={REACT3D: Recovering Articulations for Interactive Physical 3D Scenes}, 
      author={Zhao Huang and Boyang Sun and Alexandros Delitzas and Jiaqi Chen and Marc Pollefeys},
      year={2025},
      eprint={2510.11340},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.11340}, 
}