ToonifyGB: StyleGAN-based Gaussian Blendshapes for 3D Stylized Head Avatars

Rui-Yang Ju Sheng-Yen Huang Yi-Ping Hung
Graduate Institute of Networking and Multimedia
National Taiwan University
We propose an efficient two-stage framework that employs an improved StyleGAN to generate stylized head videos from input video frames and synthesize the corresponding 3D avatars using Gaussian blendshapes. Our method supports real-time synthesis of stylized avatar animations in styles such as Arcane and Pixar.

Abstract

The introduction of 3D Gaussian blendshapes has enabled the real-time reconstruction of animatable head avatars from monocular video. Toonify, a StyleGAN-based method, has become widely used for facial image stylization. To extend Toonify for synthesizing diverse stylized 3D head avatars using Gaussian blendshapes, we propose an efficient two-stage framework, ToonifyGB. In Stage 1 (stylized video generation), we adopt an improved StyleGAN to generate the stylized video from the input video frames, which overcomes the limitation of cropping aligned faces at a fixed resolution as preprocessing for normal StyleGAN. This process provides a more stable stylized video, which enables Gaussian blendshapes to better capture the high-frequency details of the video frames, facilitating the synthesis of high-quality animations in the next stage. In Stage 2 (Gaussian blendshapes synthesis), our method learns a stylized neutral head model and a set of expression blendshapes from the generated stylized video. By combining the neutral head model with expression blendshapes, ToonifyGB can efficiently render stylized avatars with arbitrary expressions. We validate the effectiveness of ToonifyGB on benchmark datasets using two representative styles: Arcane and Pixar.

Live Demo

Pipeline

Our ToonifyGB framework consists of two stages: Stage 1 involves the generation of stylized videos, and Stage 2 focuses on the synthesis of 3D stylized head avatars using Gaussian blendshapes.

Comparison

We present the original video head frames, the corresponding stylized video frames generated by our method, and the 3D stylized head avatars synthesized using Gaussian blendshapes.

 

 

Our method effectively captures and preserves high-frequency details in the stylized videos. Compared to the SOTA method, ToonifyGB can synthesize 3D stylized head avatars with comparable quality and detail.

BibTeX


    @article{ju2025toonifygb,
      title={ToonifyGB: StyleGAN-based Gaussian Blendshapes for 3D Stylized Head Avatars},
      author={Ju, Rui-Yang and Huang, Sheng-Yen and Hung, Yi-Ping},
      journal={arXiv preprint arXiv:2505.10072},
      year={2025}
    }