Restoration-Guided Kuzushiji Character Recognition Framework under Seal Interference

Kyoto University
Overview of the proposed restoration-guided Kuzushiji character recognition (RG-KCR) framework, where document restoration is introduced to mitigate seal interference prior to character classification, consisting of three stages: Kuzushiji character detection (Stage 1), Kuzushiji document restoration (Stage 2), and Kuzushiji character classification (Stage 3).

Motivation


To the best of our knowledge, existing Kuzushiji character recognition systems have not explicitly addressed the impact of seal interference, which frequently overlaps with characters and substantially degrades recognition performance. Here are the example recognition results produced by different methods under seal interference. The left column shows the input Kuzushiji characters “尚書堂梓”, while the right column presents the corresponding recognition outputs of three methods: Komonjo Camera (Fuminoha) [TOPPAN 2023], NDLkotenOCR-Lite [Aoike et al. 2024], and Metom [Imajuku et al. 2024]. The first row displays the raw document as input, whereas the second row shows the recognition results after applying our proposed document restoration method.

Data Correction

The raw Kuzushiji document images are obtained from the Center for Open Data in the Humanities (CODH) dataset [NIJL 2016], held by the National Institute of Japanese Literature (NIJL). We select 13 representative documents and exclude pages without Kuzushiji characters, resulting in a dataset of 1,000 images. During annotation review, 267 images are found to contain incomplete labels, which we manually correct by adding missing character bounding boxes to improve annotation quality. Here are the examples of incomplete annotations in the raw dataset, where red boxes are our manual annotations and green boxes indicate the original ones.

Synthetic Data Generation

Comparison examples between raw (real) Kuzushiji documents and synthetic Kuzushiji documents. The seals in the first row are naturally and physically present in the original documents, while those in the second row are synthetically added.

Stage 1 Output


Examples of qualitative detection results produced by YOLOv12-medium [Tian et al. 2025] on Kuzushiji document images from the constructed dataset.

Stage 2 Output


Qualitative comparison of restoration results between raw and restored Kuzushiji documents under hyperparameters τr = 90 and (τrg, τrb) = 1.3.

Final Output


Here are Kuzushiji document images and their corresponding final outputs produced by the proposed RG-KCR framework. For visualization, green bounding boxes and the associated modern Japanese characters are overlaid on the documents with a uniform font size of 64 pixels to enhance readability. After processing by the RG-KCR framework, readers can intuitively interpret the content of the original Kuzushiji documents.

Reference

[TOPPAN 2023] TOPPAN Inc., ふみのは (Fuminoha): 古文書カメラ (Komonjo Camera) (2023), https://camera.fuminoha.jp.

[Aoike et al. 2024] Aoike, T., Development of ndlkotenocr-lite, a lightweight ocr that runs at high speed in a cpu environment. In: IPSJ SIG Computers and the Humanities Symposium. pp. 181–186 (2024).

[Imajuku et al. 2024] Imajuku, Y., Clanuwat, T.: Metom (2024), https://huggingface.co/SakanaAI/Metom.

[NIJL 2016] National Institute of Japanese Literature (国文学研究資料館), Japanese Classical Cursive Script Kuzushiji Dataset(日本古典籍くずし字データセット) (2016), https://doi.org/10.20676/00000340.

[Tian et al. 2025] Tian, Y., Ye, Q., Doermann, D., Yolov12: Attention-centric real-time object detectors. In: Advances in Neural Information Processing Systems (2025).

BibTeX

If you use our dataset, please cite the paper below:


      
The following is the citation of the original Kuzushiji dataset; please cite it when using our benchmark dataset:

  『日本古典籍くずし字データセット』 (国文研所蔵/CODH加工) doi:10.20676/00000340