CVPR 2026

⭐ Highlight Paper

PixelRush: Ultra-Fast, Training-Free High-Resolution
Image Generation via One-step Diffusion

Hong-Phuc Lai · Phong Nguyen · Anh Tran

Qualcomm AI Research

Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.

2K generation

20s

4K generation

35×

max speedup

FID · FIDc · IS · CLIP

Paper arXiv Code Poster

4K images in under 20 seconds. PixelRush extends any pretrained diffusion model to 2K · 4K · 8K · 16K with state-of-the-art quality. Click to zoom.

Abstract

Pre-trained diffusion models excel at generating high-quality images but remain inherently limited by their native training resolution. Recent training-free approaches have attempted to overcome this constraint; however, these methods incur substantial computational overhead, often requiring more than five minutes to produce a single 4K image.

We present PixelRush, the first tuning-free framework for practical high-resolution text-to-image generation. Our method builds upon patch-based inference but eliminates multiple inversion-regeneration cycles. Instead, PixelRush enables efficient patch-based denoising in a low-step regime. To address artifacts from few-step patch blending we propose Gaussian feathering; to combat oversmoothing we introduce a noise injection mechanism.

PixelRush generates 4K images in approximately 20 seconds — a 10–35× speedup over state-of-the-art — while maintaining superior visual fidelity across all quantitative metrics.

Method

A training-free two-stage pipeline built on four targeted contributions. Click any card to expand.

Fig. 1. Base model generates coarse latent → PixelRush patchifies the upsampled latent → shallow DDIM inversion (K=249) → single-step SDXL-Turbo refinement → Gaussian-feathered recomposition. No training, no new weights.

① Partial Inversion

Skip Redundant Denoising Steps

Prior methods perturb to full Gaussian noise (T=999) and run 50 reverse steps. PixelRush inverts only to K=249 — the coarse latent already holds global structure, so the early prefix is wasted compute.

This alone yields a 3.7× speedup (67s→18s) with no quality degradation. The optimal K=249 aligns naturally with SDXL-Turbo's 4-step schedule, enabling single-step inversion and single-step denoising.

② One-step Refinement

Leverage Distilled Diffusion Models

Partial inversion pairs naturally with SDXL-Turbo (1 step) for the refinement stage. The distilled model focuses its single step on high-frequency detail; the preserved coarse structure keeps it coherent.

Combined with partial inversion, this delivers a 10–35× total speedup. PixelRush is compatible with any few-step distilled backbone — SDXL-Turbo, SD-Turbo, Pixart-δ, and others.

③ Gaussian Feathering

Eliminate Patch Boundary Artifacts

Naive uniform averaging (MultiDiffusion) produces visible checkerboard seams in the few-step regime. We replace the hard binary overlap mask with a Gaussian-smoothed weight map.

Pixels near a patch center follow that patch more strongly; the boundary is a smooth gradient. This completely eliminates seam artifacts even with single-step refinement.

④ Noise Injection

Restore High-Frequency Texture

Few-step models over-smooth because they miss the cumulative high-frequency updates of multi-step chains. We inject randomness via spherical interpolation with fresh noise.

This flattens the data distribution and recovers sharpness. Crucially, the same technique degrades multi-step pipelines — it is specifically calibrated to the low-step regime.

Fig. 2. Prior methods perturb to full noise and spend early steps re-building global structure already present in the coarse latent. PixelRush skips directly to K=249, saving 75% of inference time.

Qualitative Comparison

PixelRush produces coherent, sharp outputs at both 2K and 4K — all baselines exhibit distinct failure modes.

Fig. 3. Top: 2K · Bottom: 4K. SDXL-DI: object repetition and unnatural textures. DemoFusion: structural duplication. FouriScale: repetitive grid artifacts. FreeScale: excessive high-frequency noise. PixelRush: sharp, coherent, artifact-free. Click to zoom.

Fig. 4. Patch-based refinement naturally extends to panoramic and arbitrary aspect ratios — no retraining or modification required.

User Study

Blind perceptual preference study — 30 participants, 25 diverse prompts, 750 pairwise comparisons vs. FouriScale, DemoFusion, and FreeScale.

82.8%

Text Alignment

82.8%

84.1%

Image Quality

84.1%

86.5%

Visual Structure

86.5%

PixelRush was strongly preferred in every category. The best competing baseline (FreeScale) scored below 10% across all criteria. Participants consistently cited PixelRush's sharp textures, structural coherence, and freedom from the grid artifacts and object repetition seen in all other methods.

Ablation Study

Incremental ablation on 2K synthesis showing each component's independent contribution.

Configuration	Steps	FID ↓	FIDc ↓	IS ↑	Time
Baseline	50	54.70	32.51	13.92	67s
+ Partial inversion	15	52.90	32.04	13.89	18s
+ Few-step model	1	57.23	35.66	13.65	4s
+ Gaussian blend	1	56.16	33.17	13.77	4s
+ Noise injection	1	50.13	29.13	14.32	4s

All experiments at 2048×2048. Hover rows for commentary.

Partial inversion (67s→18s, 3.7×) confirms early denoising is wasteful — quality is preserved since the global structure remains intact.

Few-step model (18s→4s, additional 4.5×) temporarily hurts quality due to checkerboard and over-smoothing artifacts.

Gaussian blending (4s, same cost) partially recovers quality by eliminating seam artifacts.

Noise injection completes the recovery: FID drops to 50.13, actually better than the original 50-step baseline (54.70). The full pipeline is simultaneously the fastest and most accurate.

PixelRush: Ultra-Fast, Training-Free High-Resolution
Image Generation via One-step Diffusion

Abstract

Key Results