网易首页 > 网易号 > 正文 申请入驻

AI Pioneer Fei-Fei Li Unveils Real-Time Generative 'World Model' Capable of ...

0
分享至


Fei-Fei Li, Co-founder and CEO of World Labs (Image source: Bloomberg)

TMTPOST -- Fei-Fei Li, the Stanford University computer science professor often hailed as the “Godmother of AI,” has introduced a breakthrough generative model that could redefine how artificial intelligence understands and recreates the physical world.

Li’s startup, World Labs, announced the launch of its Real-Time Frame Model (RTFM) on Oct. 17 — a highly efficient autoregressive diffusion Transformer trained end-to-end on massive video datasets. The model’s key innovation lies in its ability to generate realistic 2D images from new viewpoints using only one or a few input images, without relying on traditional 3D representations.

Within the industry, RTFM is being described as “AI that has learned to render.” The system can simulate physical phenomena such as 3D geometry, reflections, and shadows, and can even reconstruct real-world environments from limited photo data.

According to Li, RTFM can generate persistent, 3D-consistent scenes in real time using a single NVIDIA H100 GPU, paving the way for interactive experiences in both real and imagined virtual spaces.

“Elegant, scalable approaches will ultimately prevail in AI,” Li’s team wrote in an accompanying article. “Generative world models are ideally positioned to benefit from the exponential decline in computing costs that has driven technological progress for decades.”


In response, former Google senior engineer Rui Diao noted that RTFM’s latest breakthrough effectively resolves the long-standing scalability challenges that have hindered world models.

Spatial intelligence refers to the ability of humans or machines to perceive, understand, and interact within three-dimensional space. The concept was first introduced by American psychologist Howard Gardner in his theory of multiple intelligences, describing the brain’s capacity to form a mental model of the external spatial world and manipulate it.

Spatial intelligence enables individuals to think in three dimensions, perceive both external and internal imagery, and recreate, transform, or modify these images. This allows people to navigate environments with ease, manipulate objects at will, and generate or interpret graphical information.

Broadly, spatial intelligence encompasses not only spatial orientation but also visual discrimination and visual reasoning. For machines, it refers to the ability to process visual data in three-dimensional space, make accurate predictions, and act upon them. This allows AI systems to operate and make decisions in complex 3D environments, overcoming the limitations of traditional 2D perception.

Fei-Fei Li has noted that visual capability sparked the Cambrian explosion, and that the evolution of the nervous system gave rise to intelligence. “We want AI that can act, not just see and speak,” she emphasizes.

With the rise of a new generation of generative AI, the combination of spatial intelligence and world models has emerged as a key pathway toward artificial general intelligence (AGI). Advanced world models can reconstruct, generate, and simulate persistent, interactive, and physically accurate environments in real time, poised to transform industries ranging from software to robotics.

Li and her team consider spatial intelligence and world models essential tools for overcoming AI’s technical barriers. Compared with existing technologies, they aim to maintain world model performance while reducing GPU resource requirements and enabling real-time interactions more efficiently.

Under current video architectures, generating a 60-frame-per-second 4K interactive stream would require over 100,000 tokens per second—roughly equivalent to the length of Frankenstein or the first Harry Potter book. Sustaining this for an hour would demand processing more than 100 million contextual tokens, a level neither feasible nor economically viable with today’s infrastructure.

To address this, in March 2025, Li, alongside scholars Ben Mildenhall, Justin Johnson, and Christoph Lassner, founded World Labs and developed RTFM, which delivers three core advantages: efficiency, scalability, and persistence.

Efficiency is demonstrated by the fact that a single NVIDIA H100 GPU can support interactive, frame-rate inference. Scalability is achieved through its end-to-end architecture, which can be continuously optimized as data and computational power grow. Persistence is ensured through pose-aware frame-space memory and context scheduling, allowing world scenes to “never fade away,” enabling long-term, consistent interactions in simulated environments.


In September, World Labs announced it had raised $230 million in funding, led by a16z, NEA, and Radical Ventures. The round also saw participation from the venture arms of AMD, Adobe, Databricks, Shinrai Investments LLC, and NVIDIA Ventures, headed by CEO Jensen Huang.

The company employs around 24 people, including four co-founders, among them Fei-Fei Li, with roughly one-third of the team of Chinese descent. Public reports indicate that World Labs reached a valuation of $1 billion just three months after its founding.

Looking ahead, investors say Fei-Fei Li’s team will first develop a spatial intelligence large model, LWM, designed to deeply understand three-dimensional, physical, spatial, and temporal concepts. The model is expected to support augmented reality applications, before being applied to robotics, improving autonomous vehicles, automated factories, and humanoid robots.

Li has stated that the team aims to launch its first product as early as 2025, while acknowledging that many challenges remain, from business models to technical boundaries. “We are still at the very beginning,” she said, “but we believe our team will overcome these challenges.”

In parallel, Li is also developing the Behavior visual challenge competition, intended to replicate the success of ImageNet, which helped catalyze the deep learning revolution and the broader AI boom. For this reason, Li is widely regarded as a driving force in “enabling AI to truly understand the world.”

The inspiration for Behavior arose from three major challenges in robot learning: the lack of standardized tasks, which makes comparing research difficult; the absence of a unified task framework, with many tasks being short and limited in scope; and a shortage of training data.

This October, Li officially released Behavior 1K, also known as the Behavior 1000 Challenge. It is a comprehensive simulation benchmark and training environment for embodied intelligence and robotics research, including 1,000 long-horizon tasks set in everyday household environments—real-world tasks requiring multiple steps to complete. Behavior provides an open-source training and evaluation platform, allowing researchers worldwide to train algorithms and compare results under consistent standards.

“What excites me even more is that we are at a civilizational turning point: language, spatial, visual, embodied intelligence, and other AI technologies are converging and beginning to truly transform human society,” Li said. “As long as we always keep human-centeredness at heart, these technologies can become a force for good for humanity.”

Li’s team indicated that World Labs will continue to enhance its model’s dynamic scene simulation and user interaction capabilities, and that larger-scale models are expected to deliver even stronger performance in the future.

特别声明:以上内容(如有图片或视频亦包括在内)为自媒体平台“网易号”用户上传并发布,本平台仅提供信息存储服务。

Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.

相关推荐
热点推荐
冷空气发力!江苏最新预测:只有6℃!

冷空气发力!江苏最新预测:只有6℃!

江苏警方
2025-11-08 09:36:11
中纪委发文,严查医院一把手

中纪委发文,严查医院一把手

医疗器械经销商联盟
2025-11-07 15:55:43
期待!21岁国足红星终率队进中超 连2年获中甲本土金靴+狂轰33球

期待!21岁国足红星终率队进中超 连2年获中甲本土金靴+狂轰33球

我爱英超
2025-11-08 17:04:14
51比49决定委内瑞拉的命运,特朗普动武已无后顾之忧?

51比49决定委内瑞拉的命运,特朗普动武已无后顾之忧?

今墨缘
2025-11-08 18:18:48
武汉一警花嫁给“副局长”,结婚3年后,偶然发现丈夫的真实身份

武汉一警花嫁给“副局长”,结婚3年后,偶然发现丈夫的真实身份

五元讲堂
2024-09-02 09:45:42
对中国电动汽车,欧盟正酝酿加征关税,把奔驰宝马雷诺全炸出来了

对中国电动汽车,欧盟正酝酿加征关税,把奔驰宝马雷诺全炸出来了

青烟小先生
2025-11-08 19:36:44
上海男篮109-82湖北小组第二出线,张知垚27+12,董灏21+12+6

上海男篮109-82湖北小组第二出线,张知垚27+12,董灏21+12+6

懂球帝
2025-11-08 18:02:16
医生手术时,涉诈停机

医生手术时,涉诈停机

中国新闻周刊
2025-11-08 14:11:05
手机黑屏瞬间,她竟莫名背上11万元贷款!

手机黑屏瞬间,她竟莫名背上11万元贷款!

现代快报
2025-11-08 18:24:10
单位通知我被辞退,收拾时主管问我是干啥的,我笑了:明天就知道了

单位通知我被辞退,收拾时主管问我是干啥的,我笑了:明天就知道了

温情邮局
2025-10-29 10:16:06
迟迟不给中国道歉后,高市政府发出战争威胁,中方用8字宣告结局

迟迟不给中国道歉后,高市政府发出战争威胁,中方用8字宣告结局

娱乐叭叭君
2025-11-08 18:01:01
打不打就在特朗普一念之间,美方派人来问,中方救不救委内瑞拉

打不打就在特朗普一念之间,美方派人来问,中方救不救委内瑞拉

古事寻踪记
2025-11-07 07:16:48
曾经的北欧天堂,如何一步步被难民和“圣母病”拖入深渊

曾经的北欧天堂,如何一步步被难民和“圣母病”拖入深渊

南权先生
2025-11-08 19:40:03
苹果官网再现价格BUG,只要3折?

苹果官网再现价格BUG,只要3折?

果粉之家
2025-11-08 12:03:05
河北一同学聚餐,喝茅台、贵菜,酒后还多人装醉,结账时集体沉默

河北一同学聚餐,喝茅台、贵菜,酒后还多人装醉,结账时集体沉默

平祥生活日志
2025-11-08 12:58:42
大破防!没了贵人撑腰,过气女明星的残酷,在白百何身上展露无遗

大破防!没了贵人撑腰,过气女明星的残酷,在白百何身上展露无遗

妙知
2025-11-07 16:43:49
多位名人骨灰盒被盗,家属含泪公开此事,疑与柬埔寨诈骗集团有关

多位名人骨灰盒被盗,家属含泪公开此事,疑与柬埔寨诈骗集团有关

新民周刊
2025-11-08 17:07:47
中国空军才是最可怕的?2场空战让美军醒悟,中国远比想象要强大

中国空军才是最可怕的?2场空战让美军醒悟,中国远比想象要强大

云上乌托邦
2025-11-08 11:52:06
“臀大腰粗”的女生怎么穿好看?吊带背心搭深灰瑜伽裤,高雅自信

“臀大腰粗”的女生怎么穿好看?吊带背心搭深灰瑜伽裤,高雅自信

小乔古装汉服
2025-09-29 07:55:03
出席“白色恐怖”追思会,郑丽文强调吴石是间谍跟政治思想犯不同

出席“白色恐怖”追思会,郑丽文强调吴石是间谍跟政治思想犯不同

新时光点滴
2025-11-08 16:02:08
2025-11-08 20:27:00
钛媒体APP incentive-icons
钛媒体APP
独立财经科技媒体
126002文章数 861359关注度
往期回顾 全部

科技要闻

美股“AI八巨头”单周市值损失8000亿美元

头条要闻

"全网最像夫妻"被喊话测DNA 妻子:做过婚检 越长越像

头条要闻

"全网最像夫妻"被喊话测DNA 妻子:做过婚检 越长越像

体育要闻

马刺绞赢火箭,不靠文班亚马?

娱乐要闻

古二再度放料!秦雯王家卫吐槽出现新人物

财经要闻

小马、文远回港上市 但自动驾驶还没赢家

汽车要闻

特斯拉Model Y后驱长续航版上线:28.85 万元

态度原创

游戏
亲子
时尚
公开课
军事航空

玩家破译出《质量效应5》隐藏截图:暗示游戏剧情?

亲子要闻

爸爸妈妈的出差vlog

推广 || 断舍离N次也不会断了它,买过最值的冬日单品之一

公开课

李玫瑾:为什么性格比能力更重要?

军事要闻

福建舰更多细节公布 一组动图全方位看清楚

无障碍浏览 进入关怀版