Recent progress in diffusion-based video editing techniques has shown remarkable potential and is being increasingly utilized in practical applications. However, these methods remain prohibitively expensive and particularly challenging to deploy on mobile devices. In this study, we introduce a series of optimizations that render mobile video editing feasible. Building upon the existing image editing model, we first optimize its architecture and incorporate a lightweight autoencoder. Subsequently, we extend classifier-free guidance distillation to multiple modalities, resulting in a threefold on-device speedup. Finally, we reduce the number of sampling steps to one by introducing a novel adversarial distillation scheme which preserves the controllability of the editing process. Collectively, these optimizations enable video editing at an impressive 12 frames per second on mobile devices, while maintaining high editing quality.
In this work, we introduce several optimizations to accelerate diffusion-based video editing:
These optimizations enable 12 frames per second video editing on mobile devices, marking a significant milestone towards real-time text-guided video editing on mobile platforms.
Input
"In chinese ink style"
"In caricature style"
"In pop art style"
Input
"Turn him into silver surfer"
"Add wrinkles"
"Add sunglasses"
Input
"In pixar 3d style"
"Turn him into vampire"
"In pencil drawing style"
Input
"Make him bronze"
"Turn him into hulk"
"In Minecraft style"
Input
"In Monet style"
"In Monet style"
Input
"Make him wooden"
"Make him wooden"
Input
"Make it desert"
"Make it desert"
"Make it desert"
"Make it desert"
Input
"Turn the swan into flamingo"
"Turn the swan into flamingo"
"Turn the swan into flamingo"
"Turn the swan into flamingo"
Input
"Add grass"
"Add grass"
"Add grass"
"Add grass"
Input
"Add snow"
"Add snow"
"Add snow"
"Add snow"
Input
"Make him zombie"
"Make him zombie"
"Make him zombie"
"Make him zombie"
Input
"Make him yeti"
"Make him yeti"
"Make him yeti"
"Make him yeti"
Input
"Make her hair blonde"
"Make her hair blonde"
"Make her hair blonde"
"Make her hair blonde"
Input
"Add fire"
"Add fire"
"Add fire"
"Add fire"