PanoWorld: A Generative Spatial World Model for Consistent Whole-House Panorama Synthesis

Jinrang Jia, Zhenjia Li, Yijiang Hu, Yifeng Shi
Ke Holdings Inc.
†Corresponding author

The overall pipline of PanoWorld

Main figure for PanoWorld.

Abstract

Generating a consistent whole-house VR tour from a floorplan and style reference requires both photorealistic panoramas and cross-view spatial coherence. Pure 2D generators produce appealing single panoramas but re-imagine geometry and materials when the viewpoint changes, whereas monolithic 3D generation becomes expensive and loses fine texture at multi-room scale. We introduce PanoWorld, a generative spatial world model that treats whole-house synthesis as autoregressive generation of node-based 360-degree panoramas, matching the discrete navigation used by real VR tour products. PanoWorld uses a floorplan-derived 3D shell as a global geometric proxy and a dynamic 3D Gaussian Splatting cache as renderable spatial memory. A feed-forward panoramic LRM designed for metric-scale multi-room 360-degree inputs lifts generated panoramas into local 3DGS updates, while Room-aware Group Attention suppresses cross-room feature interference. A topology-aware progressive caching strategy fuses these local updates without repeatedly reconstructing the full history. By decoupling shell-based geometry guidance from cache-rendered visual memory, PanoWorld preserves high-frequency 2D synthesis quality while improving cross-node layout and material consistency.

Interactive Whole-House Panorama Tour

Explore the whole-house panoramic results generated by PanoWorld through a VR-style viewer. Given a floorplan and a style reference, PanoWorld synthesizes a coherent set of multi-view panoramas for the entire house, exhibiting strong cross-view consistency across different viewpoints.

Drag to look around. Scroll to zoom. Click floating markers or the floorplan map to jump to any viewpoint.

Floorplan Navigator Current viewpoint highlighted
Top-down floorplan map for panorama viewpoints.
Loading panorama tour...
Viewpoint 0000 French luxury style

Whole-house Room-Aware Panoramic LRM

Whole-house room-aware panoramic LRM.

Room-aware panoramic LRM. Grouped attention allows dense intra-room interaction and restricted cross-room communication only through topological boundaries.

Topology-Aware Progressive 3DGS Caching

Topology-aware progressive 3DGS caching.

Progressive 3DGS caching. PanoWorld updates spatial memory through local topology-aware increments instead of full history reconstruction.

PanoWorld-LRM Whole House Reconstruction Results (12 views input)

Qualitative Comparison on Whole-House Panorama Synthesis

Qualitative comparison on whole-house panorama synthesis.

We compare PanoWorld with representative adapted baselines on multi-node panorama generation.

PanoWorld Qualitative Results Under Different Target Styles

PanoWorld qualitative results under different target styles.

PanoWorld preserves cross-room geometry and material identity while generating furnished panoramas under different target styles.

Whole-House LRM Reconstruction Visualization

Whole-house LRM reconstruction visualization.

The comparison shows room-level panorama renderings for different reconstruction methods.

Quantitative Comparison

Panorama Synthesis Quality

HPSv3 measures single-node aesthetic quality, CLIP-I Style measures image-reference style consistency, and cross-node consistency is evaluated by Overlap PSNR (PSNRov). Bar heights are normalized per metric for readability, while exact values are shown above each bar.

Whole-House Reconstruction Quality

8 Panorama Inputs

12 Panorama Inputs

Metrics are computed from panorama renderings of reconstructed 3D representations. LPIPS is a lower-is-better metric, and its bar heights are inverted only for visual comparison.

BibTeX

@misc{jia2026panoworldgenerativespatialworld,
      title={PanoWorld: A Generative Spatial World Model for Consistent Whole-House Panorama Synthesis},
      author={Jinrang Jia and Zhenjia Li and Yijiang Hu and Yifeng Shi},
      year={2026},
      eprint={2605.17916},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.17916},
}