BluRef

★ CVPR 2026 (Main Conference)

Unsupervised Image Deblurring with Dense-Matching References

Bang-Dang Pham^1† · Anh Tran² · Cuong Pham^2,3 · Minh Hoai^2,4

¹University of
Wisconsin–Madison

²Qualcomm AI Research*

³Posts & Telecom. Inst. of Tech.

⁴University of Adelaide

^†Work done while at Qualcomm AI Research. *Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.

Paper arXiv

Code (TBD)

Video (TBD)

TL;DR — BluRef is the first unpaired reference-guided framework for unsupervised image deblurring. Instead of requiring costly paired blur–sharp datasets, BluRef uses easily obtainable sharp reference frames (captured nearby in time, but never the exact ground truth) to generate pseudo-sharp supervision via dense matching — only during training. At test time, BluRef operates as a direct single-pass deblurring model with no reference frames needed.

Three paradigms for image deblurring. (Left) Supervised methods require costly paired blur–sharp datasets, limiting scalability. (Middle) Reblurring-based methods (e.g., Blur2Blur) rely on indirect mappings through intermediate blur domains. (Right) BluRef (ours) directly learns from unpaired blurry and sharp reference images within the target domain. Reference frames are used only during training to generate pseudo-sharp supervision — at inference, BluRef is a standard single-pass deblurring network.

Abstract

Deblurring Without Paired Data or Pre-trained Networks

This paper introduces a novel unsupervised approach for image deblurring that utilizes a simple process for training data collection, thereby enhancing the applicability and effectiveness of deblurring methods. Our technique does not require meticulously paired data of blurred and corresponding sharp images. Instead, it uses unpaired blurred and sharp images of similar scenes to generate pseudo-ground truth data by leveraging a dense matching model to identify correspondences between a blurry image and reference sharp images. Thanks to the simplicity of this training data collection process, our approach does not rely on existing paired training data or pre-trained networks, making it more adaptable to various scenarios and suitable for networks of different sizes, including those designed for low-resource devices. We demonstrate that this novel approach achieves state-of-the-art unsupervised performance, marking a significant advancement in the field of image deblurring.

Contributions

Three Key Contributions

🎯

First Unpaired Reference-Guided Deblurring

BluRef is the first framework to use unpaired sharp reference images for training only — no paired supervision, no pre-trained deblurring network, and no reference frames needed at inference. Unlike supervised reference-based methods that require exact correspondences, BluRef works with easily collected, temporally displaced frames.

♻️

Reusable Pseudo-Ground Truth

BluRef's pseudo-GT images can be reused to train models of varying capacities, including lightweight networks for mobile deployment — something other unsupervised methods cannot provide.

🏆

State-of-the-Art Unsupervised Results

Extensive experiments show BluRef outperforms all prior unsupervised methods. On the real-world RB2V benchmark, BluRef surpasses the supervised upper-bound (27.87 vs. 27.43 dB). On PhoneCraft (real smartphone data), BluRef achieves the best NIQE and FID scores.

Method

BluRef — Reference-based Deblurring

BluRef frames deblurring as an iterative enhancement process. Given a blurry image I_blur and N unpaired sharp reference images from similar scenes, our goal is to estimate the corresponding sharp image without any paired supervision. Crucially, none of the reference images needs to be the exact sharp version of the blurry input — they can be captured at different times and from different spatial perspectives. The dense matching module and pseudo-sharp generation are used only during training; at test time, BluRef operates as a direct single-pass deblurring network.

Full pipeline

👆 Hover over each step below to explore the pipeline

Input & References

Collect I_blur and N unpaired sharp references {I_refⁿ}. Feed into Dense Matching 𝒢.

No alignment needed — references can come from different viewpoints and times.

Pseudo-Sharp Generation

𝒢 finds pixel-level correspondences → pseudo-sharp I_pseudo^(k) + confidence mask M_pseudo^(k).

Built on PDC-Net+ / GLU-Net-GOCor, self-supervisedly trained on sharp images only.

Network Optimization

Train NAFNet / Restormer using pseudo-sharp as GT under masked loss ℒ_deblur.

Problem: Only one fixed pseudo-GT version — low quality early on, model never improves beyond it.

Iterative Refinement ⭐

Feed improved I_deblur^(k) back → regenerate better pseudo-GT each epoch.

Solution: Better deblurred → better matching → better pseudo-GT → better model. No iteration at test time.

💡 Why iteration is essential

If the deblurring network trains with a single, fixed pseudo-sharp target (Step 3 alone), performance is limited by the initial low-quality pseudo-GT. By adding the iterative loop (Step 4), improved deblurred estimates feed back into dense matching to regenerate better pseudo-GT at each epoch — creating a self-improving cycle that converges toward ground-truth quality.

Three Pseudo-Sharp Aggregation Strategies

When multiple reference images are available, BluRef offers three strategies for combining their dense-matching outputs into a single pseudo-sharp target:

Weighted Average

Avg.

Apply 𝒟ℳ independently to each (I_blur, I_refⁿ) pair, then average the resulting pseudo-sharp images weighted by confidence masks.

Sequential Accumulation

Seq.

Iteratively refine I_pseudo by using the output of the previous iteration as input for the next, chaining sharp detail across references.

Progressive Averaging ⭐

Prog. (Best)

Combines both strategies — retaining sharp details from prior iterations and selectively enhancing previously unmatched regions. Achieves top performance.

Dense Matching — Self-supervised Training

The 𝒟ℳ model bridges the gap between blurry and sharp domains by learning to extract corresponding regions across both. It is trained self-supervisedly using only sharp images with synthetic warp augmentation — no blurry training images needed. The model takes a target image (blurry or deblurred) and a reference, outputting a warped result I_trans and a confidence mask M_conf. Random motion-blur augmentation prevents leakage of the target domain's blur patterns into DM training.

Reference Frame Collection Protocol

A key advantage of BluRef is that collecting training data is significantly simpler than assembling paired blur–sharp datasets. For each blurry image, we collect N sharp reference images from nearby frames in the same video. The exact protocol differs per dataset, reflecting the practical constraints of each scenario:

Protocol for collecting sharp reference images. For structured benchmarks (GoPro, RB2V), consecutive sharp frames displaced by Δ are selected from both sides. For in-the-wild data (PhoneCraft), references are randomly sampled from separate sharp clips.

Synthetic

GoPro

1,050 blurry + 1,053 sharp images from the same sequences. Sharp references are consecutive frames shifted by Δ ∈ {1, 10, 20} on each side. N = 6 references per blurry image by default.

Real Blur

RB2V

5,400 blurry + 5,600 sharp images from real street scenes. Same temporal-shift protocol as GoPro. Tests BluRef's ability to handle real-world blur patterns and moving objects.

In-the-Wild

PhoneCraft

12 blurry + 11 sharp video clips (30–40s each, 60fps) from a handheld smartphone. Blurry and sharp clips are completely separate — references are randomly selected to eliminate any temporal correlation.

Experiments

Quantitative Comparison

Goal: Compare BluRef against existing unsupervised deblurring methods (DualGAN, UID-GAN, UAUD) and supervised upper-bounds (NAFNet, Restormer trained on paired data) across two benchmarks — GoPro (synthetic blur) and RB2V (real blur) — with temporal gaps Δ = 1, 10, and 20 frames.

Key finding: BluRef with Progressive Averaging consistently outperforms all unsupervised baselines by a large margin. On the real-world RB2V dataset, NAFNet-BluRef (Prog.) achieves 27.87 dB — surpassing the supervised Restormer upper-bound of 27.43 dB, despite using zero paired training data.

Comparison of deblurring methods on GoPro and RB2V

PSNR↑ / SSIM↑ scores. Best in bold red, second best underlined blue.

Method	GoPro (Δ=1)	GoPro (Δ=10)	GoPro (Δ=20)	RB2V (Δ=1)	RB2V (Δ=10)	RB2V (Δ=20)
Unsupervised Deblurring
DualGAN	22.23/0.721	22.10/0.719	21.24/0.702	21.01/0.512	20.87/0.500	20.92/0.505
UID-GAN	23.42/0.732	23.18/0.724	22.38/0.724	22.22/0.578	22.01/0.551	22.13/0.569
UAUD	24.25/0.792	24.02/0.750	23.77/0.745	22.87/0.590	22.29/0.581	22.28/0.581
BluRef — NAFNet backbone (Ours)
NAFNet–BluRef (Avg.)	29.32/0.933	29.21/0.915	29.15/0.911	25.97/0.783	25.96/0.783	25.65/0.775
NAFNet–BluRef (Seq.)	29.82/0.947	29.68/0.940	29.60/0.940	26.14/0.790	26.02/0.787	25.93/0.780
NAFNet–BluRef (Prog.) ⭐	31.94/0.960	31.87/0.955	31.52/0.947	27.87/0.821	27.72/0.820	27.24/0.812
BluRef — Restormer backbone (Ours)
Restormer–BluRef (Avg.)	27.12/0.905	27.04/0.895	26.98/0.893	25.41/0.816	25.38/0.812	24.78/0.801
Restormer–BluRef (Seq.)	28.46/0.923	28.37/0.920	28.31/0.912	25.22/0.810	25.20/0.811	24.73/0.792
Restormer–BluRef (Prog.)	31.02/0.950	30.97/0.949	30.95/0.938	26.82/0.839	26.76/0.832	26.13/0.829
Supervised Upper-bound
NAFNet (supervised)	33.32 / 0.962			28.54 / 0.824
Restormer (supervised)	32.92 / 0.961			27.43 / 0.849

On RB2V, NAFNet-BluRef (Prog.) achieves 27.87 dB, surpassing the supervised Restormer upper-bound of 27.43 dB — despite using zero paired training data. This demonstrates that dense-matching references can provide supervision signal comparable to or better than paired datasets on real-world blur.

Qualitative Comparison

Visual examples confirm the quantitative trends. Unsupervised baselines produce blurry or artifact-laden outputs, while BluRef recovers fine details — body shapes, facial features, and textures — approaching supervised quality.

Qualitative results on GoPro and RB2V. Unsupervised baselines (DualGAN, UID-GAN, UAUD) fail to recover sharp details. BluRef with NAFNet / Restormer produces results close to the supervised upper-bound and ground truth.

Ablation

Robustness to Temporal Gap

❓ Does BluRef remain effective when reference frames differ significantly from the blurry input due to camera or object motion?

In practice, reference frames may be temporally far from the blurry input, leading to significant content misalignment. We test BluRef's robustness by increasing Δ from 1 to 10 and 20 frames — drastically reducing the overlap between blur and reference content.

Visualizing content gaps. At Δ = 10 and 20, reference frames show substantially different content from the blurry input, with matched regions falling below 40%.

Matched Content Between Blur & Reference

Despite severe content gaps, BluRef's dense matching extracts enough correspondences for effective pseudo-sharp generation.

Dataset	Δ = 10	Δ = 20
GoPro	36.1%	28.4%
RB2V	33.7%	25.2%

All percentages are below 40%, yet BluRef shows only minor PSNR drops (e.g., 31.94→31.52 dB on GoPro, 27.87→27.24 dB on RB2V from Δ=1 to Δ=20). This confirms BluRef is practical for real-world settings where exact temporal alignment is not available.

Pseudo-Sharp Quality Over Training

As training progresses, the deblurring network produces cleaner estimates, which in turn provide better inputs for the dense matching module. This virtuous cycle progressively expands the confidence mask coverage and improves pseudo-sharp quality.

Visual progression of pseudo-sharp generation. The binary confidence mask gradually expands as the dense matching captures more correspondences between the deblurred estimate and sharp references. By 400K iterations, the pseudo-sharp image closely resembles the ground truth.

PSNR of pseudo-sharp images over training

Pseudo-sharp PSNR over training iterations on GoPro and RB2V (Δ=1). After ~100K iterations, pseudo-sharp quality converges toward ground truth, validating the self-improving cycle: better deblurred → better matching → better pseudo-GT → better model.

Real-World Generalization

PhoneCraft — Can BluRef Handle Fully Unpaired Real-World Data?

❓ Can BluRef generalize to fully unconstrained real-world settings where blurry and sharp data come from completely separate, unsynchronized recordings?

Why PhoneCraft matters: PhoneCraft is a challenging real-world benchmark consisting of 12 blurry and 11 sharp video clips (30–40 seconds each, 60fps) recorded by a handheld smartphone in unconstrained environments. Unlike GoPro and RB2V, PhoneCraft has no paired ground truth and no temporal alignment between blurry and sharp clips at all. Blurry frames arise from rapid camera or object motion, while sharp clips are captured when the camera is mostly static. References are randomly selected from separate sharp clips for each blurry input, eliminating any temporal correlation.

This experiment directly demonstrates BluRef's practicality: it confirms that our method works with separate, unsynchronized sharp and blurry videos — a setting that is trivially easy to collect in practice, requiring no special equipment or controlled conditions.

What PhoneCraft Demonstrates

📱 Easy Data Collection

Collecting usable training data in BluRef's setting is drastically simpler than assembling paired blur–sharp datasets. Just record blurry and sharp clips of similar scenes — no synchronization needed.

🔀 Works with Separate Videos

BluRef handles completely separate, unsynchronized blurry and sharp recordings. References are randomly sampled, proving the method does not rely on temporal correspondence.

🌍 Real-World Robustness

Strong performance on unconstrained smartphone data — with complex blur patterns, varying motion, and no controlled conditions — confirms generalization beyond curated benchmarks.

Quantitative Results on PhoneCraft & Synergistic Combination

Table 4. Unsupervised models on PhoneCraft (NIQE↓ / FID↓). BluRef + Blur2Blur achieves 8.47 / 5.62 — the best scores by a clear margin.

Result: BluRef alone (10.43/6.45) already outperforms all baselines including Blur2Blur variants. When combined with Blur2Blur (RSBlur), the synergy pushes results to 8.47/5.62, demonstrating that BluRef's pseudo-sharp supervision is complementary to reblurring-based approaches.

Table 5. BluRef + Blur2Blur + supervised deblurring. The combination reaches 33.30/0.963 on GoPro and 29.62/0.872 on RB2V.

Result: Combining BluRef with Blur2Blur and a supervised deblurring network achieves results that surpass the supervised upper-bound on both datasets — 33.30 vs. 33.32 on GoPro and 29.62 vs. 28.54 on RB2V.

🏅 Why Does Unsupervised BluRef Surpass Supervised Models?

Supervised models are fundamentally limited by the fidelity of their ground-truth labels. Even when paired data can be collected using specialized hardware, the resulting "sharp" labels are rarely perfect. On RB2V, hardware limitations introduce residual blur into the ground-truth images, causing supervised models to overfit to these imperfect labels.

In contrast, BluRef leverages external sharp reference frames — which can be cleaner than the hardware-limited ground truth — to generate pseudo-sharp supervision that exceeds the quality of available labels. This advantage is further demonstrated on PhoneCraft, where no ground truth exists at all: supervised baselines struggle while BluRef continues to generalize effectively.

As shown in Table 5, our unsupervised framework (BluRef + Blur2Blur) outperforms supervised models on both GoPro and RB2V — proving that reference-based pseudo-GT can overcome the ceiling imposed by imperfect paired data. See the qualitative comparison below:

Qualitative Results on PhoneCraft

Qualitative comparison on PhoneCraft. BluRef recovers text, fine textures, and structural details from real smartphone blur. Combining BluRef with Blur2Blur (RSBlur intermediate) yields the sharpest results — consistent with the quantitative improvements shown in Tables 4 and 5 above.

Citation

BibTeX

@inproceedings{pham2026bluref,
  title     = {BluRef: Unsupervised Image Deblurring with Dense-Matching References},
  author    = {Pham, Bang-Dang and Tran, Anh and Pham, Cuong and Hoai, Minh},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision
               and Pattern Recognition (CVPR)},
  year      = {2026}
}

References

[1] Chen et al., Simple Baselines for Image Restoration (NAFNet), ECCV 2022.
[2] Zamir et al., Restormer: Efficient Transformer for High-Resolution Image Restoration, CVPR 2022.
[3] Truong et al., PDC-Net+: Enhanced Probabilistic Dense Correspondence Network, TPAMI 2023.
[4] Truong et al., GLU-Net: Global-Local Universal Correspondence Network and GOCor: Learning Global Correspondence with Local Correlation, CVPR 2020 / NeurIPS 2020.
[5] Yi et al., DualGAN: Unsupervised Dual Learning for Image-to-Image Translation, ICCV 2017.
[6] Lu et al., UID-GAN: Unsupervised Image Deblurring via Generative Adversarial Networks, ICCV 2019.
[7] Pham et al., Blur2Blur: Blur Conversion for Unsupervised Image Deblurring on Unknown Domains, CVPR 2024.