A Realizable Path to Physical AI

1 minute read

Published: April 03, 2026

Physical AI is still in the early phase of its industry cycle, and new paradigms are emerging one after another. Even the term “world model” already comes with nine or ten different interpretations:

Figure: What is a World Model?

Many people are currently enthusiastic about latent-space world models; this was also the direction I mainly worked on before. But in my view, most of this line of work is still essentially VLM-based wrapping. It can improve sample efficiency and accelerate learning, but it does not truly create a model with genuine spatial understanding.

My view is that the eventual paradigm for Physical AI will be a neural network with an architecture explicitly suited to physical understanding, trained on spatial visual tokens in a way that leads to genuine physical intelligence. I believe strongly in the Bitter Lesson, so I think real physical intelligence will ultimately require a training paradigm analogous to language models, one that fully leverages data and compute. But the current LM and VLM paradigms are still fundamentally mismatched with true spatial understanding, which means we need to find a different path forward.

Xiaoling Zhou[周小灵]

A Realizable Path to Physical AI

You May Also Enjoy

Where AI Is Headed

道德经自注(帛书版)