NeRF-THIS : Neural Radiance Field based Talking Head Synthesis Incorporating Text-to-Speech
- 주제어 (키워드) talking head generation , TTS , NeRF , text-driven , deep learning
- 발행기관 서강대학교 일반대학원
- 지도교수 박형민
- 발행년도 2024
- 학위수여년월 2024. 2
- 학위명 석사
- 학과 및 전공 일반대학원 인공지능학과
- 실제URI http://www.dcollection.net/handler/sogang/000000077132
- UCI I804:11029-000000077132
- 본문언어 영어
- 저작권 서강대학교 논문은 저작권 보호를 받습니다.
초록
With the progression of deep learning techniques, the field of generating videos au- tomatically from audio or text inputs has emerged as a highly promising and rapidly evolving area of research. This paper presents NeRF-THIS(Neural Radiance Field based Talking Head Synthesis Incorporating Text-to-Speech), a novel approach to text-driven talking head generation that combines the strengths of text-based audio generation models with audio-driven video generation models. The method builds a Neural Radiance Fields (NeRF) based talking head generation architecture integrated with text-to-speech(TTS). This approch has a number of advantages. :1) it only needs 5 min of trainning data. 2)It is not constrained by Automatic Speech Recognition (ASR) models, thereby offering freedom from language barriers. 3)It cat support real-time inference in low computational cost. Our findings indicate a promising direction for future research in multimedia content generation, opening new avenues for applications in virtual reality, digital entertainment, and interactive media.
more목차
1 Introduction 1
1.1 Motivation 1
1.2 Overview of the proposed method 2
2 Related Works 4
2.1 Speech Synthesis 4
2.1.1 Neural Audio Codec based TTS 5
2.2 Talking Head Generation 8
2.2.1 Text-Driven 8
2.2.2 Audio-Driven 8
2.2.3 NeRF-Based Talking Head Synthesis 9
3 Proposed Method 11
3.1 Overview 11
3.2 input features 13
3.2.1 audio and text features 13
3.2.2 video features 13
3.3 Text-to-Speech Module 14
3.4 NeRF-based Talking Head Generation Module 14
3.5 Loss Function 15
4 Experiments 16
4.1 Dataset and Evaluation Metrics 16
4.2 Quantitative Evaluation Results 16
4.3 Qualitative Evaluation Results 17
4.3.1 User Study 17
4.4 Advantages of TTS-integrated Synthesis 19
4.4.1 Synchronization 19
4.4.2 Efficiency 20
5 Conclusion 22
5.1 Conclusion 22
5.2 limitations and future work 22
Bibliography 24