DMDiff: Degradation-Modeled Multipath Diffusion for Tunable Metalens Photography

1MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University   2Fudan University   3Tsinghua University
ICCV 2025
Main Results

Abstract

Metalenses offer significant potential for ultra-compact computational imaging but face challenges from complex optical degradation and computational restoration difficulties. Existing methods typically rely on precise optical calibration or massive paired datasets, which are non-trivial for real-world imaging systems. Furthermore, a lack of control over the inference process often results in undesirable hallucinated artifacts. We introduce Degradation-Modeled Multipath Diffusion for tunable metalens photography, leveraging powerful natural image priors from pretrained models instead of large datasets. Our framework uses positive, neutral, and negative-prompt paths to balance high-frequency detail generation, structural fidelity, and suppression of metalens-specific degradation, alongside pseudo data augmentation. A tunable decoder enables controlled trade-offs between fidelity and perceptual quality. Additionally, a spatially varying degradation-aware attention (SVDA) module adaptively models complex optical and sensor-induced degradation. Finally, we design and build a millimeter-scale MetaCamera for real-world validation. Extensive results show that our approach outperforms state-of-the-art methods, achieving high-fidelity and sharp image reconstruction. More materials: https://dmdiff.github.io/.

Method Overview

To tackle the challenges of metalens-based imaging, we propose a degradation-modeled multipath diffusion framework that leverages pretrained large-scale generative diffusion models for tunable metalens photography. Our approach addresses three key challenges: complex metalens degradations, limited paired training data, and hallucinations in generative models. With the powerful natural image priors from the base generative diffusion model, our method reconstructs vivid and realistic images using a small training dataset. To further enhance restoration, we propose a Spatially Varying Degradation-Aware (SVDA) attention module, which quantifies optical aberrations and sensor-induced noise to guide the restoration process. Additionally, we introduce a Degradation-modeled Multipath Diffusion (DMDiff) framework, incorporating positive, neutral, and negative-prompt paths to balance detail enhancement and structural fidelity while mitigating metalens-specific distortions. Finally, we design an instantly tunable decoder, enabling dynamic control over reconstruction quality to suppress hallucinations.

Overview for our method

Overview for our method

More Results

Qualitative results for our method
Qualitative comparisons of different methods on real-world images captured by our system.
Qualitative results for our method
Qualitative comparisons of different methods on our unseen test dataset, zoom in for details.

Instantly Tunable Decoding demonstration(α:0-1.3)

Interpolate start reference image.

Input

Loading...
Interpolate start reference image.

Input

Loading...
Interpolate start reference image.

Input

Loading...
Interpolate start reference image.

Input

Loading...

BibTeX


      @article{zhang2025dmdiff,
        title={Degradation-Modeled Multipath Diffusion for Tunable Metalens Photography},
        author={Jianing Zhang, Jiayi Zhu, Feiyu Ji, Xiaokang Yang, and Xiaoyun Yuan},
        booktitle={arxiv preprint arxiv:2506.22753}, 
        year={2025}
      }