Gaussian Splat Optimizer
Optimize 3D Gaussian Splatting scenes for real-time rendering on Apple platforms (iOS, macOS, visionOS) using Metal.
When to Use
- Optimizing
.plyor.splatfiles for mobile/Apple GPU targets - Reducing gaussian count for performance (pruning strategies)
- Implementing Level-of-Detail (LOD) for large scenes
- Compressing splat data for bandwidth/storage constraints
- Profiling and optimizing Metal rendering performance
- Targeting specific FPS goals on Apple hardware
Quick Start
Input: Provide a .ply/.splat file path, target device class, and FPS target.
# Analyze a splat file
python ~/.claude/skills/gsplat-optimizer/scripts/analyze_splat.py scene.ply --device iphone --fps 60
Output: The skill provides:
- Point/gaussian pruning plan (opacity, size, error thresholds)
- LOD scheme suggestion (distance bins, gaussian subsets)
- Compression recommendation (if bandwidth/storage bound)
- Metal profiling checklist with shader/compute tips
Optimization Workflow
Step 1: Analyze the Scene
First, understand your scene characteristics:
- Gaussian count: Total number of splats
- Opacity distribution: Histogram of opacity values
- Size distribution: Gaussian scale statistics
- Memory footprint: Estimated GPU memory usage
Step 2: Determine Target Device
| Device Class | GPU Budget | Max Gaussians (60fps) | Storage Mode | |-------------|-----------|----------------------|--------------| | iPhone (A15+) | 4-6GB unified | ~2-4M | Shared | | iPad Pro (M1+) | 8-16GB unified | ~6-8M | Shared | | Mac (M1-M3) | 8-24GB unified | ~8-12M | Shared/Managed | | Vision Pro | 16GB unified | ~4-6M (stereo) | Shared | | Mac (discrete GPU) | 8-24GB VRAM | ~10-15M | Private |
Step 3: Apply Pruning
If gaussian count exceeds device budget:
- Opacity threshold: Remove gaussians with opacity < 0.01-0.05
- Size culling: Remove sub-pixel gaussians (< 1px at target resolution)
- Importance pruning: Use LODGE algorithm for error-proxy selection
- Foveated rendering: For Vision Pro, reduce density in peripheral view
See references/pruning-strategies.md for details.
Step 4: Implement LOD (Large Scenes)
For scenes exceeding single-frame budget:
- Distance bins: Near (0-10m), Mid (10-50m), Far (50m+)
- Hierarchical structure: Octree or LoD tree for spatial queries
- Chunk streaming: Load/unload based on camera position
- Smooth transitions: Opacity blending at chunk boundaries
See references/lod-schemes.md for details.
Step 5: Apply Compression (If Needed)
For bandwidth/storage constraints:
| Method | Compression | Use Case | |--------|-------------|----------| | SOGS | 20x | Web delivery, moderate quality | | SOG | 24x | Web delivery, better quality | | CodecGS | 30x+ | Maximum compression | | C3DGS | 31x | Fast rendering priority |
See references/compression.md for details.
Step 6: Profile and Optimize Metal
- Choose storage mode: Private for static data, Shared for dynamic
- Optimize shaders: Function constants, thread occupancy
- Profile with Xcode: GPU Frame Capture, Metal System Trace
- Iterate: Measure, optimize, repeat
See references/metal-profiling.md for details.
Common Pitfalls
1. Point Cloud Density Mismatch
Problem: Gaussian count doesn't match your scene complexity, causing either visual artifacts or wasted GPU resources.
- Too sparse (undersampling): Visible gaps, blockiness, loss of fine details
- Too dense (oversampling): Exceeds device budget, causes frame drops, GPU thrashing
Debugging:
# Analyze gaussian distribution
python ~/.claude/skills/gsplat-optimizer/scripts/analyze_splat.py scene.ply --histogram
# Check against device budget
# Compare total_gaussians vs. device_max in the output table
Strategy:
- Start with device budget from Step 2 (e.g., 4M for iPhone)
- If scene exceeds budget by >20%, apply pruning before training
- If visual quality drops too much after pruning, consider LOD or chunking
- Use importance-weighted sampling (LODGE) to remove low-contribution gaussians, not just opaque ones
2. Training Instability (Gradient Explosions, Divergence)
Problem: During optimization (if fine-tuning on device), gaussian parameters diverge, causing:
- Loss suddenly jumps to NaN
- Gaussians disappear or explode in scale
- Model becomes unrecoverable mid-session
Debugging:
# Monitor loss during training
tail -f training.log | grep -E "loss|nan|inf"
# Check gradient magnitudes
python -c "
import numpy as np
from plyfile import PlyData
ply_data = PlyData.read('scene.ply')
scales = ply_data['vertex']['scale_0'].data
print(f'Scale range: {scales.min():.6f} to {scales.max():.6f}')
print(f'Any NaN: {np.isnan(scales).any()}')
"
Strategy:
- Gradient clipping: Cap gradient updates to ±0.1 scale per step
- Learning rate decay: Start at 1e-4, decay by 0.95 every epoch
- Loss regularization: Add L2 penalty on scale magnitudes to prevent explosions
- Checkpoint early: Save state every 10 iterations; rollback if loss spikes
- Freeze covariance: If converged, stop updating scale/rotation after 80% of training
- For device training: Reduce batch size or resolution if instability persists
3. Memory Limitations (OOM Errors on Large Scenes)
Problem: Scene exceeds available unified memory, causing allocation failures or GPU stalls.
- iPhone: 4–6GB shared between app + GPU
- iPad Pro: 8–16GB shared
- Vision Pro: 16GB (but stereo doubles gaussian count)
Debugging:
# Estimate memory footprint
python << 'EOF'
num_gaussians = 5_000_000 # Your count
bytes_per_gaussian = 56 # pos (12) + scale (12) + rot quaternion (16) + opacity (4) + SH DC (12)
total_mb = (num_gaussians * bytes_per_gaussian) / (1024 ** 2)
print(f"Est. memory: {total_mb:.1f} MB")
print(f"Safe for iPhone A15: {total_mb < 2000}") # Leave headroom for app
EOF
# Monitor live memory in Xcode
# Memory graph + Allocations instrument during scene load
Strategy:
- Chunking for large scenes: Break into 1–4M gaussian chunks, stream based on camera distance
- Quantization: Store gaussians in FP16 instead of FP32 (2x memory reduction)
- Pruning first: Remove <0.01 opacity or sub-pixel gaussians before transfer to device
- Lazy loading: Keep only active LOD level in memory; unload far chunks
- Vision Pro consideration: Dual-eye rendering = 2x gaussian count; cap at 4M per eye
4. Quality/Speed Trade-Offs (Over-Optimization for One Metric)
Problem: Optimizing heavily for one metric breaks another:
- Maximize FPS → visual artifacts: Over-pruning removes important geometry
- Maximize quality → frame drops: Too many gaussians for target device
- Minimize memory → banding/posterization: Excessive quantization or LOD culling
Debugging:
# Profile before/after each change
python << 'EOF'
metrics = {
"original": {"fps": 60, "gaussians": 5_000_000, "artifacts": "none"},
"after_pruning": {"fps": 58, "gaussians": 3_500_000, "artifacts": "block edges visible"},
}
for label, m in metrics.items():
print(f"{label}: {m['fps']}fps, {m['gaussians']/1e6:.1f}M, {m['artifacts']}")
EOF
Strategy:
- Define priority: Is this device speed-critical (AR, real-time) or quality-focused (preview)?
- Measure baseline: Profile original unoptimized scene first
- Iterate incrementally: Apply one optimization (pruning OR compression OR LOD), measure, decide
- Preserve quality metrics: Keep PSNR/SSIM scores; stop pruning if quality drops >1dB
- Target range: Aim for 50–60fps headroom (don't max out at exactly 60fps; device will throttle)
5. Real-Time Rendering Failures (Frame Drops, Shader Compilation)
Problem: Rendering pipeline stalls despite low gaussian count:
- First frame (cold start): 2–5s delay while shaders compile
- Mid-scene: Frame drops spike when new LOD levels load
- Smooth playback → stuttering after 30–60s
Debugging:
# Capture Metal frame statistics
# In Xcode: Product > Scheme > Edit > Run > Diagnostics
# Enable: Metal API Validation, GPU Frame Capture
# Check shader compilation time
python ~/.claude/skills/gsplat-optimizer/scripts/metal_profile.py \
--capture-shader-compile \
--target iphone14
# Monitor frame time distribution
tail -f xcode.log | grep -E "frame_time|stutter"
Strategy:
- Pre-warm shader cache: Compile all function variants on first load (avoid runtime jank)
- Limit LOD transitions: If using multiple LOD levels, cap transitions to 2 per frame
- Asynchronous streaming: Load new geometry chunks on background thread, upload in-between frames
- Device-specific tuning:
- iPhone: Keep draw calls < 50, geometry per call < 500K gaussians
- Mac: More generous; aim for < 2M gaussians per draw call
- Vision Pro: Account for stereo; effective capacity is half the budget
- Profile regimen: Run Metal System Trace before and after each optimization; track:
- GPU utilization (target 70–85%)
- Shader time (target <10ms)
- Memory bandwidth (target <50GB/s)
Key Metrics
| Metric | Target | How to Measure | |--------|--------|----------------| | Frame time | 16.6ms (60fps) | Metal System Trace | | GPU memory | < device budget | Xcode Memory Graph | | Bandwidth | < 50GB/s | GPU Counters | | Shader time | < 10ms | GPU Frame Capture |
Reference Implementation
MetalSplatter is the primary reference for Swift/Metal gaussian splatting:
- Repository: https://github.com/scier/MetalSplatter
- Supports iOS, macOS, visionOS
- ~8M splat capacity with v1.1 optimizations
- Stereo rendering for Vision Pro
Getting Started with MetalSplatter
git clone https://github.com/scier/MetalSplatter.git
cd MetalSplatter
open SampleApp/MetalSplatter_SampleApp.xcodeproj
# Set to Release scheme for best performance
Resources
Reference Documentation
- Pruning Strategies - Gaussian reduction techniques
- LOD Schemes - Level-of-detail approaches
- Compression - Bandwidth/storage optimization
- Metal Profiling - Apple GPU optimization
Research Papers
- LODGE - LOD for large-scale scenes
- FLoD - Flexible LOD for variable hardware
- Voyager - City-scale mobile rendering
- 3DGS Compression Survey