重温神作《苦涩的教训》：预判了从 GPT 到 o1/r1 到 Manus，以及更多...|算力|智能体|复杂性|语音识别

分享至

这是 AI 从业者的必读指南

周末，让我们重温下《苦涩的教训》这一神作，发布于 2019 年，预言全中，作者 Rich Sutton，是现代强化学习之父

Rich Sutton 写下《苦涩的教训》，核心观点只有一句：搜索和学习这两种通用方法，配合算力扩展，最终会碾压一切精巧的人工设计

当时主流观点还是「纯堆算力不行，要嵌入人类知识」。然后 GPT-3 来了，Scaling Laws 被验证了，语言学家设计了几十年的 NLP 流水线被一个 Transformer 端到端取代，ChatGPT 爆发。预言全部兑现

这个预测，现在也正在 Agent 领域继续验证

推理模型把搜索内化到模型内部，o1、DeepSeek-R1 不需要外部设计思维链，模型自己在 token 空间里搜索推理路径

Manus 这类 Agent 更进一步（他们在定方向的时候，复用了 Sutton 的结论：交给模型）：模型自己判断用什么工具、怎么拆解任务、如何执行。不再需要人工编排 workflow

这和 Sutton 六年前的判断完全一致：别折腾精巧设计了，通用方法配合算力扩展，最终会赢

苦涩的教训（译）

AI 研究 70 年，最大的教训只有一个：利用算力的通用方法，最终总是最有效的，而且优势极其明显

根本原因在于摩尔定律，或者更准确地说，在于单位算力成本持续指数级下降这一更普遍的规律。大多数 AI 研究都有一个隐含假设：智能体可用的算力是固定的。在这个假设下，嵌入人类知识几乎是提升性能的唯一途径。但只要把时间尺度稍微拉长，超出一个典型研究项目的周期，算力就会出现数量级的增长

为了在短期内做出成果，研究者倾向于利用自己对领域的理解。但长期来看，真正重要的只有一件事：如何利用算力。这两条路线理论上可以并行，实践中却往往相互排斥。时间花在一边，就没法花在另一边。心理上也会形成路径依赖。更麻烦的是，人类知识导向的方法往往把系统搞得很复杂，反而不利于发挥通用方法的算力优势。AI 研究者一次又一次地迟到才学会这个苦涩的教训，回顾几个最典型的案例很有启发

在国际象棋领域，1997 年击败卡斯帕罗夫的方法，核心就是大规模深度搜索。当时大多数计算机象棋研究者对此很不满。他们一直在研究如何利用人类对棋局结构的理解。当一个更简单的、基于搜索的方法配合专用硬件和软件被证明远远更有效时，这些研究者输得并不体面。他们说「暴力搜索」这次赢了，但这不是通用策略，而且人类下棋也不是这么下的。他们希望基于人类知识的方法获胜，结果失望了

围棋领域上演了同样的剧情，只是晚了 20 年。早期投入了大量精力来避免搜索，想办法利用人类知识，利用围棋的特殊结构。但当搜索被有效地大规模应用后，所有这些努力都变得无关紧要，甚至适得其反。同样重要的是通过自我对弈，来学习价值函数（即：让 AI 自己跟自己下棋，学习判断局面好坏）

这个方法在很多游戏甚至国际象棋中都很关键，尽管学习在 1997 年首次击败世界冠军的程序中并没有起主要作用。自我对弈学习，乃至学习本身，和搜索一样，都是让大规模算力发挥作用的方式。搜索和学习是 AI 研究中利用海量算力的两类最重要技术。在围棋领域，和国际象棋一样，研究者最初把精力放在利用人类理解上，希望减少搜索量，很久之后才通过拥抱搜索和学习取得了大得多的成功

在语音识别领域，1970 年代 DARPA 资助了一场早期竞赛。参赛者中有大量利用人类知识的特殊方法，涉及关于单词、音素、人类声道等等的知识。另一边是更新的统计方法，计算量更大，基于隐马尔可夫模型（HMM）。统计方法再次战胜了人类知识导向的方法。这引发了整个自然语言处理领域几十年的渐变，统计和计算开始主导这个领域。近年来深度学习在语音识别中的崛起，是这个方向上最新的一步。深度学习方法对人类知识的依赖更少，使用更多算力，在海量训练集上学习，产生了效果好得多的语音识别系统。和游戏领域一样，研究者总是试图让系统按照他们认为自己大脑工作的方式来运作。他们试图把这些知识嵌入系统。但最终证明这是适得其反的，是对研究者时间的巨大浪费。因为通过摩尔定律，海量算力变得可用，而且找到了利用它的方法

计算机视觉领域也是同样的模式。早期方法把视觉理解为搜索边缘、广义圆柱体、或者 SIFT 特征。但今天这些全被抛弃了。现代深度学习神经网络只使用卷积和某些不变性的概念，效果却好得多

这是一个重大教训。作为一个领域，我们仍然没有彻底学会它，因为我们还在犯同样的错误。要看清这一点并有效抵制它，我们必须理解这些错误为什么有吸引力。我们必须学会这个苦涩的教训：把我们自以为的思维方式嵌入系统，长期来看行不通

苦涩的教训基于以下历史观察：
1）AI 研究者经常试图把知识嵌入智能体
2）这在短期内总是有帮助的，而且让研究者个人很有成就感
3）但长期来看会遇到瓶颈，甚至阻碍进一步发展
4）突破性进展最终来自相反的方法，即通过搜索和学习来扩展算力

最终的成功带着苦涩，而且往往消化不完全，因为它是对一种受偏爱的、以人类为中心的方法的胜利

从苦涩的教训中应该学到的第一点是：通用方法的力量是巨大的。这些方法能随着算力增加而持续扩展，即使算力变得非常大也能继续扩展。能够这样无限扩展的方法似乎只有两种：搜索和学习

第二点是：心智的实际内容极其复杂，而且这种复杂性无法简化。我们应该停止寻找简单的方式来思考心智的内容，比如关于空间、物体、多智能体或对称性的简单概念。所有这些都是外部世界的一部分，而外部世界是任意的、内在复杂的。我们不应该把这些内容嵌入系统，因为它们的复杂性是无穷的。我们应该只嵌入能够发现和捕捉这种任意复杂性的元方法。这些方法的关键在于：它们能够找到好的近似解，但寻找的过程应该由我们的方法来完成，而不是由我们人类亲自来完成

我们想要的是能够像我们一样去发现的 AI 智能体，而不是包含我们已有发现的 AI 智能体

把我们的发现嵌入系统，只会让我们更难看清发现过程本身是如何运作的

The Bitter Lesson

The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other. And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation. There were many examples of AI researchers' belated learning of this bitter lesson, and it is instructive to review some of the most prominent.

In computer chess, the methods that defeated the world champion, Kasparov, in 1997, were based on massive, deep search. At the time, this was looked upon with dismay by the majority of computer-chess researchers who had pursued methods that leveraged human understanding of the special structure of chess. When a simpler, search-based approach with special hardware and software proved vastly more effective, these human-knowledge-based chess researchers were not good losers. They said that "brute force" search may have won this time, but it was not a general strategy, and anyway it was not how people played chess. These researchers wanted methods based on human input to win and were disappointed when they did not.

A similar pattern of research progress was seen in computer Go, only delayed by a further 20 years. Enormous initial efforts went into avoiding search by taking advantage of human knowledge, or of the special features of the game, but all those efforts proved irrelevant, or worse, once search was applied effectively at scale. Also important was the use of learning by self play to learn a value function (as it was in many other games and even in chess, although learning did not play a big role in the 1997 program that first beat a world champion). Learning by self play, and learning in general, is like search in that it enables massive computation to be brought to bear. Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research. In computer Go, as in computer chess, researchers' initial effort was directed towards utilizing human understanding (so that less search was needed) and only much later was much greater success had by embracing search and learning.

In speech recognition, there was an early competition, sponsored by DARPA, in the 1970s. Entrants included a host of special methods that took advantage of human knowledge---knowledge of words, of phonemes, of the human vocal tract, etc. On the other side were newer methods that were more statistical in nature and did much more computation, based on hidden Markov models (HMMs). Again, the statistical methods won out over the human-knowledge-based methods. This led to a major change in all of natural language processing, gradually over decades, where statistics and computation came to dominate the field. The recent rise of deep learning in speech recognition is the most recent step in this consistent direction. Deep learning methods rely even less on human knowledge, and use even more computation, together with learning on huge training sets, to produce dramatically better speech recognition systems. As in the games, researchers always tried to make systems that worked the way the researchers thought their own minds worked---they tried to put that knowledge in their systems---but it proved ultimately counterproductive, and a colossal waste of researcher's time, when, through Moore's law, massive computation became available and a means was found to put it to good use.

In computer vision, there has been a similar pattern. Early methods conceived of vision as searching for edges, or generalized cylinders, or in terms of SIFT features. But today all this is discarded. Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better.

This is a big lesson. As a field, we still have not thoroughly learned it, as we are continuing to make the same kind of mistakes. To see this, and to effectively resist it, we have to understand the appeal of these mistakes. We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.

One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.

The second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.

特别声明：以上内容(如有图片或视频亦包括在内)为自媒体平台“网易号”用户上传并发布，本平台仅提供信息存储服务。

Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.