Programmable-Room : Interactive Textured 3D Room Meshes Generation Using Visual Programming
- 발행기관 서강대학교 일반대학원
- 지도교수 강석주
- 발행년도 2024
- 학위수여년월 2024. 2
- 학위명 석사
- 학과 및 전공 일반대학원 인공지능학과
- 실제URI http://www.dcollection.net/handler/sogang/000000077131
- UCI I804:11029-000000077131
- 본문언어 영어
- 저작권 서강대학교 논문은 저작권 보호를 받습니다.
초록
We present Programmable-Room, a framework which interactively generates and edits a 3D room mesh, given natural language instructions. By decomposing the 3D indoor scene generation task into simpler tasks, Programmable-Room allows for user-designed room shapes and textures. More specifically, our Programmable Room interprets user-provided descriptions to create plausible 3D coordinates for room meshes, to generate panoramic images for textures, to construct 3D meshes by integrating the coordinates and panoramic images, and to arrange furniture using an existing LLM-based model, allowing users to specify single or combined actions as needed. To build a unified framework for this wide range of the decomposed tasks, our Programmable-Room incorporates visual programming which utilizes a large language model (LLM) to write a program, an ordered list of subtasks, for each instruction. For each sub-task, we developed most of the modules. For example, we utilize a pre-trained large-scale diffusion model to generate panoramic images conditioned on text and visual prompts (i.e., layout, depth, and semantic map) simultaneously. Specifically, we accelerate the performance of panoramic image generation by optimizing the training objective with 1D representation of panoramic scene obtained from bidirectional LSTM. We demonstrate Programmable Room’s flexibility in generating and editing 3D room meshes, and prove our framework’s superiority to an existing model quantitatively and qualitatively.
more초록
본논문에서 Programmable-Room을소개하고자한다. Programmable-Room은 자연어 명령을 통해 사용자와 상호 작용적으로 3D 방 메쉬를 생성하고 편집하는 프레임워크이다.또한, Programmable-Room은 3D실내장면생성작업을더작은 단위의간단한작업으로분해하여사용자가원하는대로방모양과질감을디자인 할 수 있다는 강점이 있다. 구체적으로, Programmable-Room은 사용자가 제공한 설명을해석하여방메쉬의타당한 3D좌표를생성하고질감을위한원형이미지 를생성하며,좌표및원형이미지를통합하여 3D메쉬를구성하고,가구를배치하 기위해기존의 LLM기반모델을사용한다.이는사용자가필요에따라단일또는 병합된작업을명령할수있도록한다.이러한분해된작업의다양성에대한통일 된프레임워크를구축하기위해 Programmable-Room은 visual programming (VP) 을사용한다.이는 LLM을활용하여각명령에대한하위작업의순서가정해진목 록인프로그램을작성한다.각하의작업에대해서우리는직접모듈을개발했다. 예를들어,우리는텍스트및시각적프롬프트 (레이아웃,깊이및의미지도)에조 건을걸어파노라마이미지를생성하기위해사전훈련된대규모디퓨젼모델을활 용한다.구체적으로는양방향 LSTM에서얻은파노라마장면의 1D representation 을사용하여 training objective을최적화하여파노라마이미지생성의성능을가속 화한다.실험결과를통해 Programmable-Room의방메쉬생성및편집에있어서의 유연성을보여주고,기존방법에비한정량적그리고정성적우수성을증명한다.
more목차
I Introduction 1
II Related Work 5
2.1 LLMs for Vision Tasks 5
2.2 Indoor Scene Synthesis 5
2.3 Text-based Generation 6
III Proposed Method 8
3.1 Visual Programming 9
3.2 Essential Components of Programmable-Room 13
3.3 Furniture Allocation 17
IV Experiments 19
4.1 Implementation Details 20
4.1.1 Task Instructions 20
4.1.2 Equirectangular Projection 25
4.1.3 Panoramic Depth Map Calculation 25
4.1.4 Evaluation Metrics 26
4.1.5 User Study 27
4.2 Datasets 27
4.3 Evaluation Metrics 30
4.4 Ablation Study 30
4.5 Comparison with State-of-the-art Methods 31
4.6 Control Over Room Attributes 32
V Conclusion 33
Bibliography 35

