MFE-GAN: Efficient GAN-based Framework for Document Image Enhancement and Binarization with Multi-scale Feature Extraction

Kyoto University, Monash University Malaysia, Rice University, Tamkang University
Graphs (top) and table (bottom) compare the average-score metric (ASM) with respect to total training and inference times among [Suh et al. 2022], [Ju et al. 2024], and the proposed MFE-GAN (Ours), measured on the Benchmark Dataset using an NVIDIA GeForce RTX 4090 GPU. The proposed MFE-GAN, using U-Net & EfficientNetV2-S as the generator, trains 16% ~ 79% faster than the compared methods, while inference time is reduced by 17% ~ 35%.

Abstract

Document image enhancement and binarization are commonly performed before document analysis and recognition tasks to improve the efficiency and accuracy of techniques such as optical character recognition (OCR). This is because directly recognizing text in degraded documents, particularly in color images, often obtains unsatisfactory results. Training independent generative adversarial networks (GANs) for each color channel can generate images where shadows and noise are effectively removed, which in turn facilitates efficient text information extraction. However, employing multiple GANs for different color channels requires long training and inference times. To reduce both training and inference times of models for document image enhancement and binarization, we propose MFE-GAN, an efficient GAN-based framework with multi-scale feature extraction (MFE), which incorporates Haar wavelet transformation (HWT) and normalization to process document images before feeding them into GANs for training. In addition, we present novel generators, discriminators, and loss functions to improve the model's performance, and conduct ablation studies to demonstrate their effectiveness. Experimental results on the Benchmark, Nabuco, and CMATERdb datasets show that the proposed MFE-GAN significantly reduces both the total training and inference times while maintaining comparable performance in comparison to state-of-the-art methods.

Framework


We propose MFE-GAN, an efficient GAN-based framework that incorporates a novel multi-scale feature extraction (MFE) module, together with the generator, discriminator, and loss functions. This framework adopts a three-stage architecture: Stage 1 – Document Image Processing, Stage 2 – Document Image Enhancement, and Stage 3 – Document Image Binarization.

Dataset


Examples from the three datasets used in this work: (a) Benchmark, (b) Nabuco, and (c) CMATERdb. Original images are shown on the left, and their corresponding binarized ground-truth on the right. For Benchmark Dataset, the training set comprises images from DIBCO 2009 (10 images); H-DIBCO 2010 (10 images); H-DIBCO 2012 (14 images); Bickley Diary (7 images); PHIBD (15 images); and SMADI (87 images); and the testing set consists of images from DIBCO 2011 (16 images); DIBCO 2013 (16 images); H-DIBCO 2014 (10 images); H-DIBCO 2016 (10 images); DIBCO 2017 (20 images); H-DIBCO 2018 (10 images); and DIBCO 2019 (20 images).

Comparison


Qualitative comparison of binarization methods on a sample from the DIBCO 2013 dataset: (a) Input, (b) Ground-Truth, (c) Otsu [Otsu 1979], (d) Niblack [Niblack 1985], (e) Sauvola [Sauvola et al. 2000], (f) Vo [Vo et al. 2018], (g) He [He et al. 2019], (h) Zhao [Zhao et al. 2019], (i) Suh [Suh et al. 2022], (j) Ju [Ju et al. 2024], (k) MFE-GAN (Ours).

 

 


Qualitative comparison of binarization methods on a sample from the DIBCO 2019 dataset: (a) Input Image, (b) Ground-Truth, (c) Suh [Suh et al. 2022], (d) Ju [Ju et al. 2024], (e) MFE-GAN (Ours).

Reference

[Suh et al. 2022] Sungho Suh, Jihun Kim, Paul Lukowic, Yong Oh Lee. 2022. Two-stage generative adversarial networks for binarization of color document images. Pattern Recognition.

[Ju et al. 2024] Rui-Yang Ju, Yu-Shian Lin, Yanlin Jin, Chih-Chia Chen, Chun-Tse Chien, Jen-Shiun Chiang. 2024. Three-stage binarization of color document images based on discrete wavelet transform and generative adversarial networks. Knowledge-Based Systems

[Otsu 1979] Nobuyuki Otsu. 1979. A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics.

[Niblack 1985] Wayne Niblack. 1985. Anintroductiontodigitalimageprocessing. Strandberg Publishing Company.

[Sauvola et al. 2000] Jaakko Sauvola, Matti K. Pietikänen. 2000. Adaptive document image binarization. Pattern Recognition.

[Vo et al. 2018] Quang Nhat Vo, Soo Hyung Kim, Hyung Jeong Yang, Gueesang Lee. 2018. Binarization of degraded document images based on hierarchical deep supervised network. Pattern Recognition.

[He et al. 2019] Sheng He, Lambert Schomaker. 2019. DeepOtsu: Document enhancement and binarization using iterative deep learning. Pattern Recognition.

[Zhao et al. 2019] Jinyuan Zhao, Cunzhao Shi, Fuxi Jia, Yanna Wang, Baihua Xiao. 2019. Document image binarization with cascaded generators of conditional generative adversarial networks. Pattern Recognition.

BibTeX

Conference Paper (APSIPA ASC 2025):

        @inproceedings{ju2025efficient,
          title={Efficient Generative Adversarial Networks for Color Document Image Enhancement and Binarization Using Multi-Scale Feature Extraction},
          author={Ju, Rui-Yang and Wong, KokSheik and Chiang, Jen-Shiun},
          booktitle={2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)},
          pages={1898--1903},
          year={2025},
          organization={IEEE}
        }
      
Journal Paper (Under Review):

        @article{ju2025mfegan,
          title={MFE-GAN: Efficient GAN-based Framework for Document Image Enhancement and Binarization with Multi-scale Feature Extraction},
          author={Ju, Rui-Yang and Wong, KokSheik and Jin, Yanlin and Chiang, Jen-Shiun},
          journal={arXiv preprint arXiv:2512.14114},
          year={2025}
        }