FreeUV

FreeUV generates a complete UV texture from a single face image without requiring ground-truth UV supervision during training. The method captures intricate details, such as facial hair, wrinkles, occlusions, and makeup, while demonstrating robustness across diverse scenarios, achieving high fidelity and coherent texture recovery.
Top to bottom: input face images, recovered UV textures, and FLAME model-based rendering.

Abstract

Recovering high-quality 3D facial textures from single-view 2D images is a challenging task, especially under the constraints of limited data and complex facial details such as wrinkles, makeup, and occlusions. In this paper, we introduce FreeUV, a novel ground-truth-free UV texture recovery framework that eliminates the need for annotated or synthetic UV data. FreeUV leverages a pre-trained stable diffusion model alongside a Cross-Assembly inference strategy to fulfill this objective. In FreeUV, separate networks are trained independently to focus on realistic appearance and structural consistency, and these networks are combined during inference to generate coherent textures. Our approach accurately captures intricate facial features and demonstrates robust performance across diverse poses and occlusions. Extensive experiments validate FreeUV's effectiveness, with results surpassing state-of-the-art methods in both quantitative and qualitative metrics. Additionally, FreeUV enables new applications, including local editing, facial feature interpolation, and texture recovery from multi-view images. By reducing data requirements, FreeUV offers a scalable solution for generating high-fidelity 3D facial textures suitable for real-world scenarios.

Key Idea

Selective domain utilization in FreeUV's texture recovery. Our Cross-Assembly strategy highlights how realistic appearance from in-the-wild images and structural consistency from 3DMM are selectively combined. FreeUV targets a UV-to-UV mapping with a Realistic and Consistent combination for optimal texture generation.

Method Overview

FreeUV leverages two modules, the Flaw-Tolerant Detail Extractor (left) and the UV Structure Aligner (middle), to separately capture realistic appearance and structural consistency. Combined during the Cross-Assembly inference phase (right), these modules produce high-quality UV textures from single-view images, without requiring ground-truth UV data.

Comparison of 3D face reconstruction results. Our method achieves the closest match to the original input by rendering and overlaying the recovered UV texture. Even under challenging conditions, such as extreme lighting, facial hair, and occlusions, our approach preserves fine details and color consistency.

Comparison of facial UV texture recovery. Our method robustly produces realistic textures despite challenging inputs. Even with significant distortions, occlusions, and missing regions in the input data, the recovered UV textures retain fine details, smooth transitions, and consistent color tones.

Our method outperforms HRN, FFHQ-UV, and UV-IDM in capturing fine details, achieving realism, and maintaining robustness.

BibTeX

@inproceedings{yang2025_freeuv,
        title={FreeUV: Ground-Truth-Free Realistic Facial UV Texture Recovery via Cross-Assembly Inference Strategy}, 
        author={Xingchao Yang and Takafumi Taketomi and Yuki Endo and Yoshihiro Kanamori},
        booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
        year={2025},
  }