网易首页 > 网易号 > 正文 申请入驻

AI Pioneer Fei-Fei Li Unveils Real-Time Generative 'World Model' Capable of ...

0
分享至


Fei-Fei Li, Co-founder and CEO of World Labs (Image source: Bloomberg)

TMTPOST -- Fei-Fei Li, the Stanford University computer science professor often hailed as the “Godmother of AI,” has introduced a breakthrough generative model that could redefine how artificial intelligence understands and recreates the physical world.

Li’s startup, World Labs, announced the launch of its Real-Time Frame Model (RTFM) on Oct. 17 — a highly efficient autoregressive diffusion Transformer trained end-to-end on massive video datasets. The model’s key innovation lies in its ability to generate realistic 2D images from new viewpoints using only one or a few input images, without relying on traditional 3D representations.

Within the industry, RTFM is being described as “AI that has learned to render.” The system can simulate physical phenomena such as 3D geometry, reflections, and shadows, and can even reconstruct real-world environments from limited photo data.

According to Li, RTFM can generate persistent, 3D-consistent scenes in real time using a single NVIDIA H100 GPU, paving the way for interactive experiences in both real and imagined virtual spaces.

“Elegant, scalable approaches will ultimately prevail in AI,” Li’s team wrote in an accompanying article. “Generative world models are ideally positioned to benefit from the exponential decline in computing costs that has driven technological progress for decades.”


In response, former Google senior engineer Rui Diao noted that RTFM’s latest breakthrough effectively resolves the long-standing scalability challenges that have hindered world models.

Spatial intelligence refers to the ability of humans or machines to perceive, understand, and interact within three-dimensional space. The concept was first introduced by American psychologist Howard Gardner in his theory of multiple intelligences, describing the brain’s capacity to form a mental model of the external spatial world and manipulate it.

Spatial intelligence enables individuals to think in three dimensions, perceive both external and internal imagery, and recreate, transform, or modify these images. This allows people to navigate environments with ease, manipulate objects at will, and generate or interpret graphical information.

Broadly, spatial intelligence encompasses not only spatial orientation but also visual discrimination and visual reasoning. For machines, it refers to the ability to process visual data in three-dimensional space, make accurate predictions, and act upon them. This allows AI systems to operate and make decisions in complex 3D environments, overcoming the limitations of traditional 2D perception.

Fei-Fei Li has noted that visual capability sparked the Cambrian explosion, and that the evolution of the nervous system gave rise to intelligence. “We want AI that can act, not just see and speak,” she emphasizes.

With the rise of a new generation of generative AI, the combination of spatial intelligence and world models has emerged as a key pathway toward artificial general intelligence (AGI). Advanced world models can reconstruct, generate, and simulate persistent, interactive, and physically accurate environments in real time, poised to transform industries ranging from software to robotics.

Li and her team consider spatial intelligence and world models essential tools for overcoming AI’s technical barriers. Compared with existing technologies, they aim to maintain world model performance while reducing GPU resource requirements and enabling real-time interactions more efficiently.

Under current video architectures, generating a 60-frame-per-second 4K interactive stream would require over 100,000 tokens per second—roughly equivalent to the length of Frankenstein or the first Harry Potter book. Sustaining this for an hour would demand processing more than 100 million contextual tokens, a level neither feasible nor economically viable with today’s infrastructure.

To address this, in March 2025, Li, alongside scholars Ben Mildenhall, Justin Johnson, and Christoph Lassner, founded World Labs and developed RTFM, which delivers three core advantages: efficiency, scalability, and persistence.

Efficiency is demonstrated by the fact that a single NVIDIA H100 GPU can support interactive, frame-rate inference. Scalability is achieved through its end-to-end architecture, which can be continuously optimized as data and computational power grow. Persistence is ensured through pose-aware frame-space memory and context scheduling, allowing world scenes to “never fade away,” enabling long-term, consistent interactions in simulated environments.


In September, World Labs announced it had raised $230 million in funding, led by a16z, NEA, and Radical Ventures. The round also saw participation from the venture arms of AMD, Adobe, Databricks, Shinrai Investments LLC, and NVIDIA Ventures, headed by CEO Jensen Huang.

The company employs around 24 people, including four co-founders, among them Fei-Fei Li, with roughly one-third of the team of Chinese descent. Public reports indicate that World Labs reached a valuation of $1 billion just three months after its founding.

Looking ahead, investors say Fei-Fei Li’s team will first develop a spatial intelligence large model, LWM, designed to deeply understand three-dimensional, physical, spatial, and temporal concepts. The model is expected to support augmented reality applications, before being applied to robotics, improving autonomous vehicles, automated factories, and humanoid robots.

Li has stated that the team aims to launch its first product as early as 2025, while acknowledging that many challenges remain, from business models to technical boundaries. “We are still at the very beginning,” she said, “but we believe our team will overcome these challenges.”

In parallel, Li is also developing the Behavior visual challenge competition, intended to replicate the success of ImageNet, which helped catalyze the deep learning revolution and the broader AI boom. For this reason, Li is widely regarded as a driving force in “enabling AI to truly understand the world.”

The inspiration for Behavior arose from three major challenges in robot learning: the lack of standardized tasks, which makes comparing research difficult; the absence of a unified task framework, with many tasks being short and limited in scope; and a shortage of training data.

This October, Li officially released Behavior 1K, also known as the Behavior 1000 Challenge. It is a comprehensive simulation benchmark and training environment for embodied intelligence and robotics research, including 1,000 long-horizon tasks set in everyday household environments—real-world tasks requiring multiple steps to complete. Behavior provides an open-source training and evaluation platform, allowing researchers worldwide to train algorithms and compare results under consistent standards.

“What excites me even more is that we are at a civilizational turning point: language, spatial, visual, embodied intelligence, and other AI technologies are converging and beginning to truly transform human society,” Li said. “As long as we always keep human-centeredness at heart, these technologies can become a force for good for humanity.”

Li’s team indicated that World Labs will continue to enhance its model’s dynamic scene simulation and user interaction capabilities, and that larger-scale models are expected to deliver even stronger performance in the future.

特别声明:以上内容(如有图片或视频亦包括在内)为自媒体平台“网易号”用户上传并发布,本平台仅提供信息存储服务。

Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.

相关推荐
热点推荐
一个时代落幕!伊朗反攻告一段落,特朗普断言:战争已基本结束

一个时代落幕!伊朗反攻告一段落,特朗普断言:战争已基本结束

世界热点背后解读
2026-03-10 15:25:58
CBA消息:广厦更名北控,新外援已加盟,首钢公布合照

CBA消息:广厦更名北控,新外援已加盟,首钢公布合照

工从昊懂球阿靖
2026-03-10 11:04:37
王毅话音刚落,伊朗悬着的心终于放下了,主动喊话美国停火

王毅话音刚落,伊朗悬着的心终于放下了,主动喊话美国停火

音乐时光的娱乐
2026-03-10 15:30:39
美以出现分歧,海湾多国谴责伊朗,俄外长:你们谴责美以了吗?

美以出现分歧,海湾多国谴责伊朗,俄外长:你们谴责美以了吗?

谛听骨语本尊
2026-03-10 16:14:56
第32波打击!以色列本土告急,美军迎来不眠之夜,特朗普发表讲话

第32波打击!以色列本土告急,美军迎来不眠之夜,特朗普发表讲话

爱史纪
2026-03-10 15:00:59
全红婵万家乐广告大片出炉,被吐槽像刘欢或者高晓松,太胖了!

全红婵万家乐广告大片出炉,被吐槽像刘欢或者高晓松,太胖了!

小娱乐悠悠
2026-03-10 11:54:20
宁德时代361亿年度分红再创纪录,曾毓群能分到多少?

宁德时代361亿年度分红再创纪录,曾毓群能分到多少?

界面新闻
2026-03-10 13:59:50
原子弹炸后百年内寸草不生,今广岛却住满了人,说好的百年绝地呢?

原子弹炸后百年内寸草不生,今广岛却住满了人,说好的百年绝地呢?

历史回忆室
2025-12-20 11:14:14
油价即将暴涨 92号汽油上涨0.55元/升 加油站已大排长队

油价即将暴涨 92号汽油上涨0.55元/升 加油站已大排长队

太平洋汽车
2026-03-09 21:13:10
伊朗伊斯兰革命卫队:拥有“更强大、数量更多”的导弹

伊朗伊斯兰革命卫队:拥有“更强大、数量更多”的导弹

新华社
2026-03-10 10:25:50
勇夺双冠!国乒14岁新星崛起:外公带他入门,看齐王楚钦林诗栋?

勇夺双冠!国乒14岁新星崛起:外公带他入门,看齐王楚钦林诗栋?

李喜林篮球绝杀
2026-03-10 16:37:56
哈登29000分里程碑:历史第9+现役第3仅输詹杜 神迹历史唯一

哈登29000分里程碑:历史第9+现役第3仅输詹杜 神迹历史唯一

醉卧浮生
2026-03-10 07:30:23
真的扛不住了!绵阳一外地家长哭诉,学校隔周就开家长会,引热议

真的扛不住了!绵阳一外地家长哭诉,学校隔周就开家长会,引热议

火山詩话
2026-03-09 11:06:13
骑士横扫76人达成40胜东部第4队 哈登21+5+5迎29000分里程碑

骑士横扫76人达成40胜东部第4队 哈登21+5+5迎29000分里程碑

醉卧浮生
2026-03-10 09:24:23
64秒续命9分神奇3+1绝平!约基奇创5纪录拒背锅 关键球不输SGA

64秒续命9分神奇3+1绝平!约基奇创5纪录拒背锅 关键球不输SGA

颜小白的篮球梦
2026-03-10 14:17:16
文身执枪,为弟出征:三个孩子的母亲,把悲痛活成铠甲

文身执枪,为弟出征:三个孩子的母亲,把悲痛活成铠甲

老马拉车莫少装
2026-03-09 13:23:02
沈阳市政集团破产

沈阳市政集团破产

地产微资讯
2026-03-10 10:22:32
瑞士:美以袭击伊朗违反国际法

瑞士:美以袭击伊朗违反国际法

新华社
2026-03-09 10:54:04
李诞谈养龙虾热潮:有人借AI龙虾约到5位女主播吃饭 呼吁公众理性使用

李诞谈养龙虾热潮:有人借AI龙虾约到5位女主播吃饭 呼吁公众理性使用

快科技
2026-03-10 15:03:07
CCTV5直播中国女篮首战非洲劲旅,内线世界顶级 宫鲁鸣带队开门红

CCTV5直播中国女篮首战非洲劲旅,内线世界顶级 宫鲁鸣带队开门红

中国篮坛快讯
2026-03-10 14:44:25
2026-03-10 17:15:00
钛媒体APP incentive-icons
钛媒体APP
独立财经科技媒体
130471文章数 861917关注度
往期回顾 全部

科技要闻

全民"养虾"背后:大厂集体下场疯狂卖Token

头条要闻

特朗普称已考虑接替穆杰塔巴的人选 外交部表态

头条要闻

特朗普称已考虑接替穆杰塔巴的人选 外交部表态

体育要闻

加兰没那么差,但鲈鱼会用吗?

娱乐要闻

肖战首夺SMG视帝,孙俪四封视后创历史

财经要闻

全民"养龙虾"背后 第一批受害者浮现

汽车要闻

蔚来换电和理想5C,谁能硬刚,比亚迪兆瓦闪充?

态度原创

艺术
房产
旅游
家居
手机

艺术要闻

30000亩杏花开了,新疆的春天这么美!

房产要闻

信号!千亿巨头入局,三亚开启新一轮大征拆!

旅游要闻

解决游客索道排队,峨眉山景区今起开启观光车分时预约

家居要闻

自然肌理 温度质感婚房

手机要闻

vivo X300s新机官宣“移植”蓝厂最强性能电竞体验全家桶

无障碍浏览 进入关怀版