From OpenAI’s 4o to Stable Diffusion, AI foundation models that create realistic images from a text prompt are now plentiful. In contrast, foundation models capable of generating full, coherent 3D online environments from a text prompt are only just emerging.
Still, it’s only a question of when, not if, these models will become readily available. Now one of Europe’s most prominent AI 3D model researchers, Matthias Niessner, has taken an entrepreneurial leave of absence from his visual computing & AI lab at the Technical University of Munich to found a startup working in the area: SpAItial.
Formerly a cofounder at Synthesia, the realistic AI avatar startup valued at $2.1 billion, Niessner has raised an unusually large seed round for a European startup of $13 million. The round was led by Earlybird Venture Capital, a prominent European early-stage investor (backers of UiPath, PeakGames for instance) with participation from Speedinvest and several high-profile angels.
That round size is even more impressive when taking into account that SpAItial doesn’t have much to show the world yet other than a recently released teaser video showing how a text prompt could generate a 3D room.
But then, there’s the technical team that Niessner assembled: Ricardo Martin-Brualla, who previously worked on Google’s 3D teleconferencing platform, now called Beam; and David Novotny, who spent six years at Meta where he led the company’s text-to-3D asset generation project.
Their collective expertise will give them a fighting chance in a space that already includes some competitors with a similar focus on photorealism. There’s Odyssey, which raised $27 million and is going after entertainment use cases. But there’s also World Labs, the startup founded by AI pioneer Fei-Fei Li, and already valued at over $1 billion.
Niessner thinks this is still little competition compared to what exists for other types of foundation models, but also in regard to ‘the bigger vision’ he and others are pursuing.
“I don’t just want to have a 3D world. I also want this world to behave like the real world. I want it to be interactable and [let you] do stuff in it, and nobody has really cracked that yet,” he said.
Nobody has really cracked yet what the demand for photorealistic 3D environments might be, either. The promise of a ‘trillion-dollar’ opportunity ranging from digital twins to augmented reality seems big enough to excite VCs, but it is also vague and multifaceted enough to make go-to-market strategy hard to figure out. The most obvious use case is for video game creation, but these models could also have applications in entertainment, 3D visualizations used in construction, and eventually usage in the real world for areas like robotic training.