STAC: Plug-and-Play Spatio-Temporal Aware Cache Compression for Streaming 3D Reconstruction
1University of Science and Technology of China
CVPR 2026
Abstract
Online 3D reconstruction from streaming inputs requires both long-term temporal consistency and efficient memory usage. While causal VGGT transformers address this challenge through key-value (KV) cache mechanism, the linear growth of the cache introduces a significant memory bottleneck. When memory constraints trigger early eviction, reconstruction quality and temporal consistency deteriorate markedly. In this work, we observe that attention patterns in causal transformers for 3D reconstruction exhibit intrinsic spatio-temporal sparsity. Leveraging this insight, we propose STAC, a Spatio-Temporally Aware Cache compression framework specifically designed for streaming 3D reconstruction using large causal transformers. STAC incorporates three key components: a Working Temporal Token Caching mechanism that preserves long-term informative tokens based on decayed cumulative attention scores; a Long-term Spatial Token Caching scheme that consolidates spatially redundant tokens into voxel-aligned representations for memory-efficient storage; and a Chunk-based Multi-frame Optimization strategy that jointly optimizes consecutive frames to enhance temporal coherence and leverage GPU parallelism. Extensive experiments demonstrate that STAC achieves state-of-the-art reconstruction quality while reducing memory consumption by 8.5× and accelerating inference by 3.5×, enabling scalable and real-time 3D reconstruction in streaming settings.
Pipeline
BibTeX
@inproceedings{wang2026stac,
title={STAC: Plug-and-Play Spatio-Temporal Aware Cache Compression for Streaming 3D Reconstruction},
author={Wang, Runze and Song, Yuxuan and Cai, Youcheng and Liu, Ligang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2026}
}
