HanDiffuser: Text-to-Image Generation with Realistic Hand Appearances
2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(2024)
摘要
Text-to-image generative models can generate high-quality humans, but realismis lost when generating hands. Common artifacts include irregular hand poses,shapes, incorrect numbers of fingers, and physically implausible fingerorientations. To generate images with realistic hands, we propose a noveldiffusion-based architecture called HanDiffuser that achieves realism byinjecting hand embeddings in the generative process. HanDiffuser consists oftwo components: a Text-to-Hand-Params diffusion model to generate SMPL-Body andMANO-Hand parameters from input text prompts, and a Text-GuidedHand-Params-to-Image diffusion model to synthesize images by conditioning onthe prompts and hand parameters generated by the previous component. Weincorporate multiple aspects of hand representation, including 3D shapes andjoint-level finger positions, orientations and articulations, for robustlearning and reliable performance during inference. We conduct extensivequantitative and qualitative experiments and perform user studies todemonstrate the efficacy of our method in generating images with high-qualityhands.
更多查看译文
关键词
Hands,Diffusion,Humans
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要