I am currently pursuing a Master of Science (M.S.) at the Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei City, Taiwan. I am a member of NTU imLab, advised by Prof. Yi-Ping Hung, and my current research focuses on 3D Gaussian Splatting (3DGS).
I received my Bachelor of Science (B.S.) degree in Electrical and Computer Engineering from Tamkang University, New Taipei City, Taiwan, in 2023, where I ranked first in my department. From 2020 to 2023, I conducted research in the Advanced Mixed-Operation System Laboratory (AMOS Lab.) at Tamkang University, focusing on deep learning and computer vision under the supervision of Prof. Jen-Shiun Chiang.
My primary research areas include Deep Learning (DL), Neural Networks (NN), Computer Vision (CV), and Image Processing (IP). In addition, I have contributed to some projects in other fields, such as Natural Language Processing (NLP), and Speech & Audio. I have published 10+ papers at the international journals or conferences.
Further details about my background are available in (updated in June 2025).
News
- 2025.01: One paper is accepted by ICRA 2025 (Project).
- 2025.01: Our split GRAZPEDWRI-DX dataset is now available (Project).
- 2024.09: One paper is accepted by Knowledge-Based Systems (IF=7.2) (GitHub).
Publications
3D stylized Head Avatar

ToonifyGB: StyleGAN-based Gaussian Blendshapes for 3D Stylized Head Avatars
Rui-Yang Ju, Sheng-Yen Huang, Yi-Ping Hung
Abstract
The introduction of 3D Gaussian blendshapes has enabled the real-time reconstruction of animatable head avatars from monocular video. Toonify, a StyleGAN-based framework, has become widely used for facial image stylization. To extend Toonify for synthesizing diverse stylized 3D head avatars using Gaussian blendshapes, we propose an efficient two-stage framework, ToonifyGB. In Stage 1 (stylized video generation), we employ an improved StyleGAN to generate the stylized video from the input video frames, which addresses the limitation of cropping aligned faces at a fixed resolution as preprocessing for normal StyleGAN. This process provides a more stable video, which enables Gaussian blendshapes to better capture the high-frequency details of the video frames, and efficiently generate high-quality animation in the next stage. In Stage 2 (Gaussian blendshapes synthesis), we learn a stylized neutral head model and a set of expression blendshapes from the generated video. By combining the neutral head model with expression blendshapes, ToonifyGB can efficiently render stylized avatars with arbitrary expressions. We validate the effectiveness of ToonifyGB on the benchmark dataset using two styles: Arcane and Pixar.Document Binarization

Three-stage Binarization of Color Document Images Based on Discrete Wavelet Transform and Generative Adversarial Networks
Rui-Yang Ju, Yu-Shian Lin, Yanlin Jin, Chih-Chia Chen, Chun-Tse Chien, Jen-Shiun Chiang
Abstract
The efficient extraction of text information from the background in degraded color document images is an important challenge in the preservation of ancient manuscripts. The imperfect preservation of ancient manuscripts has led to different types of degradation over time, such as page yellowing, staining, and ink bleeding, seriously affecting the results of document image binarization. This work proposes an effective three-stage network method to image enhancement and binarization of degraded documents using generative adversarial networks (GANs). Specifically, in Stage-1, we first split the input images into multiple patches, and then split these patches into four single-channel patch images (gray, red, green, and blue). Then, three single-channel patch images (red, green, and blue) are processed by the discrete wavelet transform (DWT) with normalization. In Stage-2, we use four independent generators to separately train GAN models based on the four channels on the processed patch images to extract color foreground information. Finally, in Stage-3, we train two independent GAN models on the outputs of Stage-2 and the resized original input images (512 × 512) as the local and global predictions to obtain the final outputs. The experimental results show that the Avg-Score metrics of the proposed method are 77.64, 77.95, 79.05, 76.38, 75.34, and 77.00 on the (H)-DIBCO 2011, 2013, 2014, 2016, 2017, and 2018 datasets, which are at the state-of-the-art level.PRICAI 2023
CCDWT-GAN: Generative Adversarial Networks Based on Color Channel Using Discrete Wavelet Transform for Document Image Binarization, Rui-Yang Ju, Yu-Shian Lin, Jen-Shiun Chiang, Chih-Chia Chen, Wei-Han Chen, Chun-Tse Chien.
Fracture Detection

Fracture Detection in Pediatric Wrist Trauma X-ray Images Using YOLOv8 Algorithm
Rui-Yang Ju, Weiming Cai
Abstract
Hospital emergency departments frequently receive lots of bone fracture cases, with pediatric wrist trauma fracture accounting for the majority of them. Before pediatric surgeons perform surgery, they need to ask patients how the fracture occurred and analyze the fracture situation by interpreting X-ray images. The interpretation of X-ray images often requires a combination of techniques from radiologists and surgeons, which requires time-consuming specialized training. With the rise of deep learning in the field of computer vision, network models applying for fracture detection has become an important research topic. In this paper, we use data augmentation to improve the model performance of YOLOv8 algorithm (the latest version of You Only Look Once) on a pediatric wrist trauma X-ray dataset (GRAZPEDWRI-DX), which is a public dataset. The experimental results show that our model has reached the state-of-the-art (SOTA) mean average precision (mAP 50). Specifically, mAP 50 of our model is 0.638, which is significantly higher than the 0.634 and 0.636 of the improved YOLOv7 and original YOLOv8 models. To enable surgeons to use our model for fracture detection on pediatric wrist trauma X-ray images, we have designed the application “Fracture Detection Using YOLOv8 App” to assist surgeons in diagnosing fractures, reducing the probability of error analysis, and providing more useful information for surgery.IEEE Access 2025
YOLOv8-AM: YOLOv8 Based on Effective Attention Mechanisms for Pediatric Wrist Fracture Detection, Chun-Tse Chien, Rui-Yang Ju, Kuang-Yi Chou, Enkaer Xieerke, Jen-Shiun Chiang.Electronics Letters 2024
YOLOv9 for Fracture Detection in Pediatric Wrist Trauma X-ray Images, Chun-Tse Chien, Rui-Yang Ju, Kuang-Yi Chou, Jen-Shiun Chiang.ICONIP 2024
YOLOv8-ResCBAM: YOLOv8 Based on An Effective Attention Module for Pediatric Wrist Fracture Detection, Rui-Yang Ju, Chun-Tse Chien, Jen-Shiun Chiang.
ORB-SfMLearner

ORB-SfMLearner: ORB-Guided Self-supervised Visual Odometry with Selective Online Adaptation
Yanlin Jin, Rui-Yang Ju, Haojun Liu, Yuzhong Zhong
Abstract
Deep visual odometry, despite extensive research, still faces limitations in accuracy and generalizability that prevent its broader application. To address these challenges, we propose an Oriented FAST and Rotated BRIEF (ORB)-guided visual odometry with selective online adaptation named ORB-SfMLearner. We present a novel use of ORB features for learning-based ego-motion estimation, leading to more robust and accurate results. We also introduce the cross-attention mechanism to enhance the explainability of PoseNet and have revealed that driving direction of the vehicle can be explained through the attention weights. To improve generalizability, our selective online adaptation allows the network to rapidly and selectively adjust to the optimal parameters across different domains. Experimental results on KITTI and vKITTI datasets show that our method outperforms previous state-of-the-art deep visual odometry methods in terms of ego-motion accuracy and generalizability.Super Resolution

Resolution enhancement processing on low quality images using swin transformer based on interval dense connection strategy
Rui-Yang Ju, Chih-Chia Chen, Jen-Shiun Chiang, Yu-Shian Lin, Wei-Han Chen, Chun-Tse Chien
Abstract
The Transformer-based method has demonstrated remarkable performance for image super-resolution in comparison to the method based on the convolutional neural networks (CNNs). However, using the self-attention mechanism like SwinIR (Image Restoration Using Swin Transformer) to extract feature information from images needs a significant amount of computational resources, which limits its application on low computing power platforms. To improve the model feature reuse, this research work proposes the Interval Dense Connection Strategy, which connects different blocks according to the newly designed algorithm. We apply this strategy to SwinIR and present a new model, which named SwinOIR (Object Image Restoration Using Swin Transformer). For image super-resolution, an ablation study is conducted to demonstrate the positive effect of the Interval Dense Connection Strategy on the model performance. Furthermore, we evaluate our model on various popular benchmark datasets, and compare it with other state-of-the-art (SOTA) lightweight models. For example, SwinOIR obtains a PSNR of 26.62 dB for x4 upscaling image super-resolution on Urban100 dataset, which is 0.15 dB higher than the SOTA model SwinIR. For real-life application, this work applies the lastest version of You Only Look Once (YOLOv8) model and the proposed model to perform object detection and real-life image super-resolution on low-quality images.Professional Services
Journal Reviewer
View more
- Pattern Recognition.
- Knowledge-based Systems.
- Neurocomputing.
- IEEE Signal Processing Letters.
- Scientific Reports.
- Cognitive Computation.
- International Journal of Multimedia Information Retrieval.
- International Journal of Machine Learning and Cybernetics.
- Plos One.
- Journal of Real-Time Image Processing.
- International Journal of Computational Intelligence Systems.
- The Journal of Supercomputing.
- Frontiers in Computer Science.
- Signal, Image and Video Processing.
- Telecommunication Systems.
- Journal of Engineering.
- Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization.
Conference Committee and Reviewer
View more
- Pacific Rim International Conference on Artificial Intelligence (PRICAI), Wellington, New Zealand, 2025, Program Committee.
- International Joint Conference on Neural Networks (IJCNN), Rome, Italy, 2025, Reviewer.
- IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 2025, Reviewer.
- Pacific Rim International Conference on Artificial Intelligence (PRICAI), Kyoto, Japan, 2024, Program Committee.
- Pacific Rim International Conference on Artificial Intelligence (PRICAI), Jakarta, Indonesia, 2023, Program Committee.
Honors and Scholarships
Honors
- Silver Medal (Top 4%), Google Universal Image Embedding Challenge, ECCV 2022 Competition, 2022.10.
- Excellent Academic Performance Award, Tamkang University, 2022.05.
- The Innovation and Entrepreneurship Competition Excellent Award, Tamkang University, 2021.05.
Scholarships
- National Taiwan University Scholarship, 2023 – 2025, Total: NT$32,000
- National Taiwan University (imLab) Scholarship, 2024, Total: NT$36,000
- Sino International Business Innovation Association (SIBIA) Scholarship, 2021, 2022, 2024, Total: US$900
- Tamkang University Research Scholarship, 2021 – 2022, Total: NT$48,000
- Tamkang University Scholarship (Top 1%), 2021, 2022, Total: NT$20,000
Educations
- 2023.09 - 2025.06, Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei City
- 2019.09 - 2023.06, Electrical and Computer Engineering, Tamkang University, New Taipei City