DocWaveDiff: A Predict-and-Refine approch for Document Image Enhancement with Wavelet U-Nets and Diffusion models
Abstract
OCR and document layout analysis algorithms are essential components of AI-based document systems, yet they are typically trained on clean, degradation-free images. When applied to degraded documents such as blurred scans or pages spoiled by handwritten text their performance drops significantly. To address this issue, we propose DocWaveDiff, a novel document restoration method based on a predict-and-refine diffusion framework incorporating wavelet U-Nets. Given a degraded image patch and optionally its prior features, our Early Predictor generates an initial restoration, which is then refined by a Denoiser Refiner that estimates the residual image. The combination of these outputs yields the final restored result. We evaluate DocWaveDiff on multiple public benchmarks and demonstrate its strong performance across various document degradation scenarios, including deblurring and handwriting removal. Our results confirm that integrating wavelet transforms into the predict-and-refine framework enhances restoration quality and supports more robust document understanding systems.