Mark Boss

Mark Boss

Co-Head of 3D & Image

Stability AI

Biography

Mark Boss is the Co-Head of 3D & Image at Stability AI. He worked at Unity Technologies before and completed his PhD at the University of Tübingen in the computer graphics group of Prof. Hendrik Lensch. His research interests lie at the intersection of machine learning and computer graphics, focusing mainly on inferring physical properties (shape, material, illumination) from images.

If you are interested in a research collaboration, please drop me an email with your CV.

Education

PhD in Computer Science

2023

University of Tübingen

MSc in Computer Science

2018

University of Tübingen

BSc in Computer Science

2016

Osnabrück University of Applied Sciences

News
Recent & Upcoming Talks

Generative AI for VFX & Games

Generative AI can unlock several tasks and subtasks in profession media production. This talk discusses recent advances in this field.

Inverse Rendering for Games

Asset production in the game industry is time-consuming, and since “The Vanishing of Ethan Carter” photogrammetry has gained traction. While the asset produced by photogrammetry achieves incredible detail, the illumination is baked into the texture maps. This makes the assets inflexible and limits their use in games and movies without manual post-processing. In this talk, I will present our recent work on decomposing an object into its shape, reflectance, and illumination. This highly ill-posed problem is inherently more challenging when the illumination is not a single light source under laboratory conditions but is an unconstrained environmental illumination. Decomposing an object under this ambiguous setup enables the automated creation of relightable 3D assets for AR/VR applications, enhanced shopping experiences, games, and movies from online images. In this talk, I will present our recent methods in the field of reflectance decomposition using Neural Fields. Our methods are capable of building a neural volumetric reflectance decomposition from unconstrained image collections. Contrary to most recent works that require images to be captured under the same illumination, our input images are taken under varying illuminations. This practical setup enables the decomposition of images gathered from online searches and the automated creation of relightable 3D assets. Our techniques handle complex geometries with non-Lambertian surfaces, and we also extract 3D meshes with material properties from the learned reflectance volumes enabling their use in existing graphics engines. In our last method, we also enable the decomposition of unposed image collections. Most recent reconstruction methods require posed collections. However, common pose recovery methods fail under highly varying illuminations or locations.

Neural Reflectance Decomposition

In this talk, I will present our recent work on decomposing an object into its shape, reflectance, and illumination. This highly ill-posed problem is inherently more challenging when the illumination is not a single light source under laboratory conditions but is an unconstrained environmental illumination. Decomposing an object under this ambiguous setup enables the automated creation of relightable 3D assets for AR/VR applications, enhanced shopping experiences, games, and movies from online images.In this talk, I will present our recent methods in the field of reflectance decomposition using Neural Fields. Our methods are capable of building a neural volumetric reflectance decomposition from unconstrained image collections. Contrary to most recent works that require images to be captured under the same illumination, our input images are taken under varying illuminations. This practical setup enables the decomposition of images gathered from online searches and the automated creation of relightable 3D assets. Our techniques handle complex geometries with non-Lambertian surfaces, and we also extract 3D meshes with material properties from the learned reflectance volumes enabling their use in existing graphics engines. In our last method, we also enable the decomposition of unposed image collections. Most recent reconstruction methods require posed collections. However, common pose recovery methods fail under highly varying illuminations or locations.

Publications

ReSWD: ReSTIR‘d, not shaken. Combining Reservoir Sampling and Sliced Wasserstein Distance for Variance Reduction.

Distribution matching is central to many vision and graphics tasks, where the widely used Wasserstein distance is too costly to compute for high dimensional distributions. The Sliced Wasserstein Distance (SWD) offers a scalable alternative, yet its Monte Carlo estimator suffers from high variance, resulting in noisy gradients and slow convergence. We introduce Reservoir SWD (ReSWD), which integrates Weighted Reservoir Sampling into SWD to adaptively retain informative projection directions in optimization steps, resulting in stable gradients while remaining unbiased. Experiments on synthetic benchmarks and real-world tasks such as color correction and diffusion guidance show that ReSWD consistently outperforms standard SWD and other variance reduction baselines.

SViM3D: Stable Video Material Diffusion for Single Image 3D Generation

We present Stable Video Materials 3D (SViM3D), a framework to predict multi-view consistent physically based rendering (PBR) materials, given a single image. Recently, video diffusion models have been successfully used to reconstruct 3D objects from a single image efficiently. However, reflectance is still represented by simple material models or needs to be estimated in additional pipeline steps to enable relighting and controlled appearance edits. We extend a latent video diffusion model to output spatially-varying PBR parameters and surface normals jointly with each generated RGB view based on explicit camera control. This unique setup allows for direct relighting in a 2.5D setting, and for generating a 3D asset using our model as neural prior. We introduce various mechanisms to this pipeline that improve quality in this ill-posed setting. We show state-of-the-art relighting and novel view synthesis performance on multiple object-centric datasets. Our method generalizes to diverse image inputs, enabling the generation of relightable 3D assets useful in AR/VR, movies, games and other visual media.

MARBLE: Material Recomposition and Blending in CLIP-Space

Editing materials of objects in images based on exemplar images is an active area of research in computer vision and graphics. We propose MARBLE, a method for performing material blending and recomposing fine-grained material properties by finding material embeddings in CLIP-space and using that to control pre-trained text-to-image models. We improve exemplar-based material editing by finding a block in the denoising UNet responsible for material attribution. Given two material exemplar-images, we find directions in the CLIP-space for blending the materials. Further, we can achieve parametric control over fine-grained material attributes such as roughness, metallic, transparency, and glow using a shallow network to predict the direction for the desired material attribute change. We perform qualitative and quantitative analysis to demonstrate the efficacy of our proposed method. We also present the ability of our method to perform multiple edits in a single forward pass and applicability to painting.

Stable Virtual Camera: Generative View Synthesis with Diffusion Models

We present Stable Virtual Camera (Seva), a generalist diffusion model that creates novel views of a scene, given any number of input views and target cameras. Existing works struggle to generate either large viewpoint changes or temporally smooth samples, while relying on specific task configurations. Our approach overcomes these limitations through simple model design, optimized training recipe, and flexible sampling strategy that generalize across view synthesis tasks at test time. As a result, our samples maintain high consistency without requiring additional 3D representation-based distillation, thus streamlining view synthesis in the wild. Furthermore, we show that our method can generate high-quality videos lasting up to half a minute with seamless loop closure. Extensive benchmarking demonstrates that Seva outperforms existing methods across different datasets and settings.

SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images

We study the problem of single-image 3D object reconstruction. Recent works have diverged into two directions: regression-based modeling and generative modeling. Regression methods efficiently infer visible surfaces, but struggle with occluded regions. Generative methods handle uncertain regions better by modeling distributions, but are computationally expensive and the generation is often misaligned with visible surfaces. In this paper, we present SPAR3D, a novel two-stage approach aiming to take the best of both directions. The first stage of SPAR3D generates sparse 3D point clouds using a lightweight point diffusion model, which has a fast sampling speed. The second stage uses both the sampled point cloud and the input image to create highly detailed meshes. Our two-stage design enables a probabilistic modeling of the ill-posed single-image 3D task, while maintaining high computational efficiency and great output fidelity. Using point clouds as an intermediate representation further allows for interactive user edits. Evaluated on diverse datasets, SPAR3D demonstrates superior performance over previous state-of-the-art methods, at an inference speed of 0.7 seconds.

SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement

We present SF3D, a novel method for rapid and high-quality textured mesh reconstruction from a single image. Utilizing the Large Reconstruction Model (LRM) as its foundation, SF3D achieves fast generation times with detailed, textured meshes in just 0.5 seconds. Unlike traditional approaches, SF3D is explicitly trained for mesh generation, incorporating a fast UV unwrapping technique that enables swift texture generation rather than relying on vertex colors. The method also learns material parameters and normal maps to enhance the visual quality of the reconstructed models. Furthermore, SF3D integrates a delighting step to effectively remove low-frequency illumination effects, ensuring that the reconstructed meshes can be easily used in novel illumination conditions.

Experience

Co-Head of 3D & Image

Stability AI

Leading research in generative AI for 3D.

Research Scientist

Stability AI

Research on object acquisition, usage of deep priors to reduce ambiguities, and generative AI.

Senior Research Scientist

Unity

Research on object acquisition and generative AI.

Ph.D. Student

University of Tübingen

Research on deep learning based material acquisition.

Student Researcher

Google

Research on novel techniques for material, geometry and illumination disentanglement.

Research Intern

NVIDIA

Research on casual shape and material acquisition.

Android Developer

zahlz

Development of an Android Application for a mobile payment system.

Education

PhD in Computer Science

University of Tübingen

MSc in Computer Science

University of Tübingen

BSc in Computer Science

Osnabrück University of Applied Sciences

Projects
Postergeist featured image

Postergeist

Academic poster generator that converts Markdown files to beautiful HTML posters with live preview, drag-and-drop editing, and PDF export. Also supports a Claude Code Skill to turn …

Outline.md featured image

Outline.md

A cross-platform, markdown-based hierarchical outline editor built with Flutter. Create structured documents using familiar markdown headings, export to LaTeX, and organize your …

GifDrop featured image

GifDrop

GifDrop is a desktop app that converts video to GIF and optimizes existing GIFs. Drag and drop your files, tweak quality and size, and export—no command line required. It bundles …

Citegeist featured image

Citegeist

Checks, standardizes, and upgrades .bib files automatically.

Geotex

GeoTex is a minimal Python wrapper that exposes the UV atlas generation functions from Geogram. Given a triangulated 3D mesh, it produces per-corner UV coordinates packed into a …

Recent Posts

NeRF at NeurIPS 2022

Inspired by Frank Dellaert and his excellent series on the original NeRF Explosion and the following ICCV/CVPR conference gatherings, I decided to look into creating a NeurIPS 22 rundown myself.

The papers below are all the papers I could gather by browsing through the extensive list of accepted NeurIPS papers. I mainly collected all papers where the titles fit and did a brief scan through the paper or only the abstract if the paper wasn’t published at the time of writing. If I have mischaracterized or missed any paper, please send me a DM on Twitter @markb_boss or via mail.