3D Gaussian Splatting

개요

3D Gaussian Splatting(3DGS)은 2023년 SIGGRAPH에서 발표된 실시간 방사휘도 필드(radiance field) 렌더링 기법이다. 베른하르트 케르블(Bernhard Kerbl) 등이 제안한 이 방법은 장면을 수백만 개의 3차원 가우시안 함수로 표현하고, 이를 2D 이미지 평면에 투영(splatting)하여 실시간으로 포토리얼리스틱 뷰를 생성한다. NeRF(Neural Radiance Fields) 계열 방법이 수 초~수 분의 렌더링 시간이 필요한 데 비해, 3DGS는 소비자용 GPU에서 초당 수십 프레임(30fps 이상)의 실시간 렌더링이 가능하다는 점에서 산업계의 폭발적인 관심을 받았다.

기술 원리

3DGS의 핵심 구성요소는 다음과 같다.

가우시안 프리미티브 — 각 가우시안은 3D 위치(mu), 공분산 행렬(Sigma, 회전·스케일로 분해), 불투명도(alpha), 구면 조화 함수(SH) 계수로 정의되는 방향 의존적 색상 정보를 갖는다.

초기화 — Structure-from-Motion(SfM) 알고리즘(주로 COLMAP)으로 추출한 희소 포인트 클라우드를 초기 가우시안 위치로 사용한다.

최적화 — 타일 기반 래스터라이저를 통해 각 가우시안을 카메라 뷰에 투영하고, 전방에서 후방 순으로 알파 블렌딩하여 최종 픽셀 색상을 계산한다. 이를 실제 학습 이미지와 비교하여 광도 손실(photometric loss)로 역전파한다.

적응적 밀도 제어(ADC) — 학습 중 그래디언트 크기가 큰 가우시안은 복제(clone)하거나 분할(split)하고, 기여도가 낮은 가우시안은 제거(prune)하는 적응형 관리로 장면을 세밀하게 표현한다.

NeRF와의 비교

NeRF는 MLP(다층 퍼셉트론) 네트워크가 광선 적분을 통해 볼류메트릭 렌더링을 수행하는 암묵적(implicit) 표현 방식이다. 이에 비해 3DGS는 명시적(explicit) 포인트 클라우드 기반 표현으로, 렌더링 속도가 수십~수백 배 빠르다. 그러나 3DGS는 메모리 사용량이 많고(장면에 따라 수 GB), 가우시안 수가 늘어날수록 연산이 선형 증가한다는 단점이 있다.

응용 분야

게임·VR/XR — 실시간 렌더링이 가능하므로 동적 장면 캡처와 VR 콘텐츠 제작에 활발히 연구되고 있다. 인텔·NVIDIA 등은 자사 하드웨어 최적화 버전을 출시했다.

영화·드라마 VFX — 실제 세트를 다중 카메라로 촬영한 뒤 3DGS로 디지털 트윈을 생성하여 CG 합성 비용을 절감하는 파이프라인이 도입되고 있다.

자율주행·로보틱스 — 3D 장면 이해와 시뮬레이션 환경 구축에 3DGS를 활용하는 연구가 활발하다. 웨이모, 테슬라 등이 주행 장면 재구성에 응용 중이다.

문화재·디지털 트윈 — 고해상도 카메라로 촬영한 유물·건축물을 3DGS로 디지털화하면 손실 없는 3D 아카이빙이 가능하다.

한계와 연구 방향

동적 객체 처리: 현 3DGS는 정적 장면에 최적화되어 있어 움직이는 객체 표현이 어렵다. Dynamic 3DGS, Deformable 3DGS 등의 확장 연구가 진행 중이다.

대용량 메모리: 수백만 개의 가우시안을 저장해야 하므로 압축·양자화 연구(Compact 3DGS, LightGaussian)가 활발하다.

편집 가능성: NeRF 편집 기법을 3DGS에 적용하는 Gaussian Grouping, GaussianEditor 등의 연구가 나오고 있다.

주요 오픈소스 구현

원 저자들의 공식 구현은 GitHub(graphdeco-inria/gaussian-splatting)에 공개되어 있으며, 커뮤니티에서 nerfstudio·Luma AI·Polycam 등 다양한 파생 구현과 웹 뷰어가 개발됐다.

3D Gaussian Splatting

2023년에 나온 초고속 3D 장면 렌더링 기술. 쉽게 말하면, 사진 여러 장을 찍으면 AI가 3D로 만들어주는 건데 — 이게 실시간으로 돌아간다는 게 포인트야.

핵심 원리

장면을 수백만 개의 '가우시안 방울'로 표현해. 가우시안이란 건 수학에서 종 모양 곡선인데, 여기선 3D 공간에 떠있는 반투명한 타원형 점이라고 생각하면 됨.

이 점들 각각이 위치·크기·방향·색상 정보를 갖고 있어. 카메라 시점이 바뀌면 이 점들을 빠르게 2D로 투영(splatting)해서 실시간 화면을 만드는 거야.

NeRF랑 뭐가 달라?

NeRF도 사진에서 3D 만드는 기술인데, 렌더링이 느려. 고화질 한 장 뽑는 데 몇 초~몇 분 걸림.

3DGS는 그걸 초당 30~100프레임으로 해결함. VR 쓰려면 최소 60fps는 필요한데, NeRF는 그게 불가능했거든.

어디에 쓰여?

VR·게임 — 실제 장소를 촬영해서 VR로 실시간 체험. 영화 VFX — 실제 세트를 디지털화해서 CG 합성 비용 절감. 자율주행 — 도로 장면 3D 재구성해서 시뮬레이션 데이터로 활용. 문화재 보존 — 유물 3D 아카이빙.

단점

메모리 많이 먹음: 가우시안 수백만 개를 저장해야 해서 GB 단위 용량이 필요해. 움직이는 물체 약함: 사람처럼 움직이는 건 아직 표현하기 어려워. 압축 연구가 계속 나오고 있어서 앞으로 더 가벼워질 것 같음.

오픈소스로 공개되어 있어서 GitHub에서 직접 돌려볼 수 있음. Luma AI 앱으로 스마트폰에서도 체험 가능.

3D Gaussian Splatting

사진을 많이 찍으면 컴퓨터가 3D 모양을 만들어주는 마법 같은 기술이야.

예를 들어, 네 방을 빙 돌면서 사진을 100장 찍어. 그러면 이 기술이 사진들을 분석해서 "아, 방이 이렇게 생겼구나!" 하고 3D 방을 만들어 주는 거야.

특별한 점은 아주 빠르다는 거야. 게임처럼 실시간으로 볼 수 있어. 고개를 왼쪽으로 돌리면 왼쪽에서 본 3D 모습이 바로 나와!

이걸 어떻게 하냐면, 장면을 수백만 개의 작은 반투명한 방울로 나눠서 표현해. 마치 수채화에서 물감 방울을 겹겹이 쌓아 그림을 그리는 것처럼!

이 기술 덕분에 유명한 박물관 작품을 3D로 인터넷에서 볼 수 있고, 영화에서 실제 장소처럼 보이는 장면을 컴퓨터로 만들 수 있어. 나중에는 VR 기기를 쓰고 실제 세계를 통째로 3D로 체험할 수 있을 거야!

3D Gaussian Splatting: Real-Time Radiance Field Rendering for Dynamic Environments

Overview

3D Gaussian Splatting (3DGS) is a groundbreaking real-time radiance field rendering technique unveiled at SIGGRAPH 2023. Developed by researchers including Bernhard Kerbl, this innovative method revolutionizes visual fidelity by representing scenes as millions of three-dimensional Gaussian functions. These functions, characterized by position, covariance matrices detailing orientation and scale, opacity, and spherical harmonic coefficients for directional color information, are then projected onto a 2D image plane through a process called splatting. Unlike NeRF-based approaches requiring several seconds to minutes for rendering, 3DGS achieves photorealistic visuals at speeds exceeding 30 frames per second on consumer GPUs, garnering significant industry interest due to its unprecedented efficiency.

Technical Principles

At the heart of 3DGS lie several key components:

Gaussian Primitives: Each Gaussian function encapsulates directional color information through its 3D position (mu), covariance matrix (Sigma, decomposed into rotation and scale), opacity (alpha), and spherical harmonic function (SH) coefficients.

Initialization: Sparse point clouds generated using Structure-from-Motion (SfM) algorithms, often COLMAP, serve as the initial placement points for these Gaussian primitives.

Optimization: A tile-based rasterization process projects each Gaussian onto the camera view, sequentially applying alpha blending from front to back to determine final pixel colors. This process leverages photometric loss comparison with real images for iterative refinement during training.

Adaptive Density Control (ADC): This adaptive mechanism dynamically manages Gaussian density throughout training. Large gradient magnitudes trigger replication or splitting of Gaussians, while those contributing minimally are pruned, enabling nuanced scene representation.

Comparison with NeRF

While NeRF employs implicit volumetric rendering through multilayer perceptron (MLP) networks performing ray integration, 3DGS adopts an explicit point cloud representation. This fundamental difference translates into significantly faster rendering speeds for 3DGS, often ranging from tens to hundreds of times quicker than NeRF. However, 3DGS faces challenges related to memory consumption, potentially requiring gigabytes of storage per scene, and experiencing linear computational complexity increases with the number of Gaussians.

Applications

The versatility of 3DGS fuels its adoption across diverse fields:

Gaming & VR/XR: Real-time rendering capabilities make it ideal for dynamic scene capture and immersive VR content creation, with hardware manufacturers like Intel and NVIDIA developing optimized implementations.

Film & Television VFX: 3DGS facilitates cost-effective CGI integration by generating digital twins of real-world sets captured with multiple cameras, streamlining post-production workflows.

Autonomous Vehicles & Robotics: Its ability to simulate realistic 3D environments drives research in autonomous navigation and robotic perception, with companies like Waymo and Tesla exploring its applications in scene reconstruction.

Cultural Heritage & Digital Twins: High-resolution scanning paired with 3DGS enables lossless digital archiving of artifacts and architectural structures, preserving cultural heritage digitally.

Limitations and Future Directions

Despite its advancements, 3DGS presents ongoing challenges:

Dynamic Object Handling: Current implementations primarily excel in static scenes, requiring further development in Dynamic 3DGS and Deformable 3DGS for accurate representation of moving objects.

Memory Efficiency: Managing the vast number of Gaussians necessitates research into compression techniques like Compact 3DGS and LightGaussian to mitigate memory demands.

Editing Capabilities: Expanding NeRF-inspired editing functionalities to 3DGS through approaches like Gaussian Grouping and GaussianEditor promises enhanced creative control.

Open-Source Implementations

The original 3DGS codebase is publicly available on GitHub (graphdeco-inria/gaussian-splatting), fostering a vibrant community ecosystem. Numerous derivative implementations, including nerfstudio, Luma AI, and Polycam, along with web-based viewers, further democratize access to this transformative technology.

English version not yet available.

3D Gaussian Splatting

3D Gaussian Splatting

개요

기술 원리

NeRF와의 비교

응용 분야

한계와 연구 방향

주요 오픈소스 구현

3D Gaussian Splatting

핵심 원리

NeRF랑 뭐가 달라?

어디에 쓰여?

단점

3D Gaussian Splatting

3D Gaussian Splatting: Real-Time Radiance Field Rendering for Dynamic Environments

문서 정보