CVPR 2026

⭐ Highlight Paper

PixelRush: Ultra-Fast, Training-Free High-Resolution
Image Generation via One-step Diffusion

Hong-Phuc Lai · Phong Nguyen · Anh Tran

Qualcomm AI Research

Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.

4s
2K generation
20s
4K generation
35×
max speedup
#1
FID · FIDc · IS · CLIP
PixelRush 4K generation results

4K images in under 20 seconds. PixelRush extends any pretrained diffusion model to 2K · 4K · 8K · 16K with state-of-the-art quality. Click to zoom.

Abstract

Pre-trained diffusion models excel at generating high-quality images but remain inherently limited by their native training resolution. Recent training-free approaches have attempted to overcome this constraint; however, these methods incur substantial computational overhead, often requiring more than five minutes to produce a single 4K image.


We present PixelRush, the first tuning-free framework for practical high-resolution text-to-image generation. Our method builds upon patch-based inference but eliminates multiple inversion-regeneration cycles. Instead, PixelRush enables efficient patch-based denoising in a low-step regime. To address artifacts from few-step patch blending we propose Gaussian feathering; to combat oversmoothing we introduce a noise injection mechanism.


PixelRush generates 4K images in approximately 20 seconds — a 10–35× speedup over state-of-the-art — while maintaining superior visual fidelity across all quantitative metrics.

Key Results

4s
2K generation time
vs. 28–87 s for competing methods. Up to 22× faster at 2K.
20s
4K generation time
vs. 247–680 s for baselines. Up to 34× faster at 4K.
50.13
FID at 2K ↓ (best)
Surpasses FreeScale (52.87), DemoFusion (68.46), FouriScale (72.65).

Method

A training-free two-stage pipeline built on four targeted contributions. Click any card to expand.

PixelRush pipeline

Fig. 1. Base model generates coarse latent → PixelRush patchifies the upsampled latent → shallow DDIM inversion (K=249) → single-step SDXL-Turbo refinement → Gaussian-feathered recomposition. No training, no new weights.

① Partial Inversion

Skip Redundant Denoising Steps

Prior methods perturb to full Gaussian noise (T=999) and run 50 reverse steps. PixelRush inverts only to K=249 — the coarse latent already holds global structure, so the early prefix is wasted compute.

This alone yields a 3.7× speedup (67s→18s) with no quality degradation. The optimal K=249 aligns naturally with SDXL-Turbo's 4-step schedule, enabling single-step inversion and single-step denoising.

② One-step Refinement

Leverage Distilled Diffusion Models

Partial inversion pairs naturally with SDXL-Turbo (1 step) for the refinement stage. The distilled model focuses its single step on high-frequency detail; the preserved coarse structure keeps it coherent.

Combined with partial inversion, this delivers a 10–35× total speedup. PixelRush is compatible with any few-step distilled backbone — SDXL-Turbo, SD-Turbo, Pixart-δ, and others.

③ Gaussian Feathering

Eliminate Patch Boundary Artifacts

Naive uniform averaging (MultiDiffusion) produces visible checkerboard seams in the few-step regime. We replace the hard binary overlap mask with a Gaussian-smoothed weight map.

Pixels near a patch center follow that patch more strongly; the boundary is a smooth gradient. This completely eliminates seam artifacts even with single-step refinement.

④ Noise Injection

Restore High-Frequency Texture

Few-step models over-smooth because they miss the cumulative high-frequency updates of multi-step chains. We inject randomness via spherical interpolation with fresh noise.

This flattens the data distribution and recovers sharpness. Crucially, the same technique degrades multi-step pipelines — it is specifically calibrated to the low-step regime.

Hierarchical denoising

Fig. 2. Prior methods perturb to full noise and spend early steps re-building global structure already present in the coarse latent. PixelRush skips directly to K=249, saving 75% of inference time.

Speed Comparison

PixelRush is orders of magnitude faster than every competing training-free baseline.

All times measured on a single A100-40GB GPU. PixelRush uses SDXL-Turbo (1 step); baselines use SDXL (50 steps).

Qualitative Comparison

PixelRush produces coherent, sharp outputs at both 2K and 4K — all baselines exhibit distinct failure modes.

Qualitative comparison

Fig. 3. Top: 2K · Bottom: 4K. SDXL-DI: object repetition and unnatural textures. DemoFusion: structural duplication. FouriScale: repetitive grid artifacts. FreeScale: excessive high-frequency noise. PixelRush: sharp, coherent, artifact-free. Click to zoom.

Flexible aspect ratios

Fig. 4. Patch-based refinement naturally extends to panoramic and arbitrary aspect ratios — no retraining or modification required.

Quantitative Results

Evaluated on 1000 prompts from LAION-2B Aesthetics. PixelRush achieves the best score on every metric.

* SDXL-Turbo (1 step). All baselines: SDXL 50 steps. FIDc = crops-based FID (local texture quality). Green = best.

User Study

Blind perceptual preference study — 30 participants, 25 diverse prompts, 750 pairwise comparisons vs. FouriScale, DemoFusion, and FreeScale.

82.8%
Text Alignment
82.8%
84.1%
Image Quality
84.1%
86.5%
Visual Structure
86.5%
PixelRush was strongly preferred in every category. The best competing baseline (FreeScale) scored below 10% across all criteria. Participants consistently cited PixelRush's sharp textures, structural coherence, and freedom from the grid artifacts and object repetition seen in all other methods.

Ablation Study

Incremental ablation on 2K synthesis showing each component's independent contribution.

ConfigurationStepsFID ↓FIDc ↓IS ↑Time
Baseline5054.7032.5113.9267s
+ Partial inversion1552.9032.0413.8918s
+ Few-step model157.2335.6613.654s
+ Gaussian blend156.1633.1713.774s
+ Noise injection150.1329.1314.324s

All experiments at 2048×2048. Hover rows for commentary.

Partial inversion (67s→18s, 3.7×) confirms early denoising is wasteful — quality is preserved since the global structure remains intact.

Few-step model (18s→4s, additional 4.5×) temporarily hurts quality due to checkerboard and over-smoothing artifacts.

Gaussian blending (4s, same cost) partially recovers quality by eliminating seam artifacts.

Noise injection completes the recovery: FID drops to 50.13, actually better than the original 50-step baseline (54.70). The full pipeline is simultaneously the fastest and most accurate.

BibTeX

@inproceedings{lai2026pixelrush,
  title     = {PixelRush: Ultra-Fast, Training-Free High-Resolution Image Generation via One-step Diffusion},
  author    = {Lai, Hong-Phuc and Nguyen, Phong and Tran, Anh},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026},
}

This page was built using the Academic Project Page Template, which was adopted from the Nerfies project page. This website is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.