网易首页 > 网易号 > 正文 申请入驻

DeepSeek Unveils Its Most Powerful Open-source Innovation, Challenging GPT-5...

0
分享至


Credit: unsplash

On the occasion of ChatGPT's third birthday, its competitor DeepSeek showed up with a “birthday gift” that seems a little too competitive, as if unwilling to let the pioneer of large language models enjoy an easy celebration.

On the evening of December 1, DeepSeek unveiled two official models—DeepSeek-V3.2 and DeepSeek-V3.2-Speciale—in one go. The accompanying technical paper reveals that these models have achieved world-leading reasoning capabilities.

According to DeepSeek, the newly updated “regular lineup” V3.2—now available on the web, app, and via API—strikes a balance between reasoning ability and output length, making it well-suited for everyday use.

In benchmark reasoning tests, V3.2 and GPT-5, as well as Claude 4.5, showed varied strengths across different domains. Only Gemini 3 Pro delivered a noticeably stronger overall performance compared to the first three.


Source: DeepSeek official WeChat

Meanwhile, DeepSeek also stated that compared to the recently released Kimi-K2-Thinking from the domestic large model developer Moonshot AI, DeepSeek V3.2 has significantly reduced output length, which greatly decreases computational overhead and user wait time. In agent benchmarking, V3.2 also outperformed other open-source models such as Kimi-K2-Thinking and MiniMax M2, making it the strongest open-source large model to date. Its overall performance is now extremely close to that of the top closed-source models.


Image from DeepSeek official WeChat

What’s even more noteworthy is V3.2’s performance in certain Q&A scenarios and general agent tasks. In a specific case involving travel advice, for example, V3.2 leveraged deep reasoning along with web crawling and search engine tools to provide highly detailed and accurate travel tips and recommendations. The latest API update for V3.2 also supports tool usage in “thinking mode” for the first time, greatly enriching the usefulness and breadth of answers users receive.

In addition, DeepSeek specifically emphasized that V3.2 was not specially trained on the tools featured in these evaluation datasets.

We’ve observed that while benchmark scores for large models are climbing, these models often make basic factual errors in everyday user interactions (a criticism especially directed at GPT-5 upon its release). Against this backdrop, DeepSeek has made a point of highlighting with each update that it avoids relying solely on correct answers as a reward mechanism. As a result, they have not produced a so-called “super-intelligent brain” that appears clever in benchmarks yet fails at simple tasks and questions that matter to ordinary users—a “low EQ” AI agent.

Overcoming this challenge at a fundamental level—becoming a large model with both high IQ and high EQ—is the key to developing a truly versatile, reliable, and efficient AI agent. DeepSeek also believes that V3.2 can demonstrate strong generalization capabilities in real-world application scenarios.

In order to strike a balance between computational efficiency, powerful reasoning capabilities, and agent performance, DeepSeek has implemented comprehensive optimizations across training, integration, and application layers. According to its technical paper, V3.2 introduces DSA (DeepSeek Sparse Attention mechanism), which significantly reduces computational complexity in long-context scenarios while maintaining model performance.

At the same time, to integrate reasoning capabilities into tool-using scenarios, DeepSeek has developed a new synthesis pipeline that enables systematic, large-scale generation of training data. This approach facilitates scalable agent post-training optimization, substantially improving generalization in complex, interactive environments as well as the model’s ability to follow instructions.

In addition, as mentioned earlier, V3.2 is also the first model from DeepSeek to incorporate reasoning into tool usage, greatly enhancing the model’s generalization capabilities.

If the focus of V3.2 is on “saying things that make sense and getting things done”—a balance-seeking approach for practical intelligent agents—then the positioning of the “Special Forces” V3.2 Speciale is to push the reasoning ability of open-source models to the limit and explore the boundaries of model capabilities through extended reasoning.

It’s worth noting that a major highlight of V3.2 Speciale is its integration of the theorem-proving capabilities from DeepSeek-Math-V2, the most powerful mathematical large model released just last week.

Math-V2 not only achieved gold-medal-level performance in the 2025 International Mathematical Olympiad and the 2024 China Mathematical Olympiad, but also outperformed Gemini 3 in the IMO-Proof Bench benchmark evaluation.

Moreover, in a similar vein to previously discussed approaches, this mathematical model is also striving to overcome the limitations of correct-answer reward mechanisms and the so-called “test-solver” identity by adopting a self-verification process. In doing so, it seeks to break through the current bottlenecks in AI’s deep reasoning, enabling large models to truly understand mathematics and logical derivations; as a result, it aims to achieve more robust, reliable, and versatile theorem-proving capabilities.

With its greatly enhanced reasoning abilities, V3.2 Speciale has achieved Gemini 3.0 Pro-level results in mainstream reasoning benchmarks. However, V3.2 Speciale’s performance advantages come at the cost of consuming a large number of tokens, which significantly increases its operational costs. As a result, it currently does not support tool calls or everyday conversation and writing, and is intended for research use only.

From OCR to Math-V2, then to V3.2 and V3.2 Speciale, each of DeepSeek’s recent product launches has been met with widespread praise. At the same time, these releases have not only brought significant improvements in overall capabilities, but also continually clarified the main development trajectories of “practicality” and “generalization”.

In the second half of 2025, with GPT-5, Gemini 3, and Claude Opus 4.5 launching one after another—each outperforming the last in benchmark tests—and with DeepSeek rapidly catching up, the race to be crowned the “most powerful large model” is already getting crowded. Leading large models are now showing clear distinctions in their training approaches as well as their unique characteristics in real-world performance, setting the stage for an even more exciting competition among large models in 2026. (Author|Hu Jiameng, Editor|Li Chengcheng)

特别声明:以上内容(如有图片或视频亦包括在内)为自媒体平台“网易号”用户上传并发布,本平台仅提供信息存储服务。

Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.

相关推荐
热点推荐
国外网友曝光外卖app各种极限压榨用户和外卖员的骚操作... 然而,这居然是个局??

国外网友曝光外卖app各种极限压榨用户和外卖员的骚操作... 然而,这居然是个局??

英国那些事儿
2026-01-06 23:28:34
司晓迪高清照片无码流出,一人睡遍整个娱乐圈顶流?看完太炸裂了

司晓迪高清照片无码流出,一人睡遍整个娱乐圈顶流?看完太炸裂了

乌娱子酱
2026-01-06 14:19:18
杨瀚森彻底不惯着了!赛场硬抢篮板,留洋终弃老好人标签

杨瀚森彻底不惯着了!赛场硬抢篮板,留洋终弃老好人标签

糖宝影视w
2026-01-06 16:42:09
省委书记不打招呼、直奔现场

省委书记不打招呼、直奔现场

新京报政事儿
2026-01-06 22:37:08
流落柬埔寨街头的网红毒检呈阳性,家属紧急赴柬欲接其回国

流落柬埔寨街头的网红毒检呈阳性,家属紧急赴柬欲接其回国

封面新闻
2026-01-06 19:13:33
国羽最新战报!男队3连败,高昉洁逆转手下败将,混双NO.2横扫!

国羽最新战报!男队3连败,高昉洁逆转手下败将,混双NO.2横扫!

刘姚尧的文字城堡
2026-01-07 19:37:25
强制女生停课陪舞,从 三个确保到收受茅台,江苏教育前掌门被查

强制女生停课陪舞,从 三个确保到收受茅台,江苏教育前掌门被查

娱乐圈见解说
2026-01-07 13:27:04
老雷:如果有人打电话给我,我愿意免费执教任何俱乐部

老雷:如果有人打电话给我,我愿意免费执教任何俱乐部

懂球帝
2026-01-07 22:25:14
正式走人!CBA得分王被裁掉,黑马球队更换外援,冲击前八

正式走人!CBA得分王被裁掉,黑马球队更换外援,冲击前八

体坛瞎白话
2026-01-07 10:01:06
舆论炸锅!网传烟草连夜下发命令,严禁员工晒工资…

舆论炸锅!网传烟草连夜下发命令,严禁员工晒工资…

慧翔百科
2026-01-06 09:04:26
从斩首计划,到擒贼先擒王,美以似乎在重新改写现代战争的打法

从斩首计划,到擒贼先擒王,美以似乎在重新改写现代战争的打法

历史摆渡
2026-01-05 17:20:03
曝广东可能上诉!陈家政三分球没踩线,裁判专家怒喷:业余联赛!

曝广东可能上诉!陈家政三分球没踩线,裁判专家怒喷:业余联赛!

杜鱂手工制作
2026-01-07 13:41:19
伊能静真人又矮又胖,臀部好宽大

伊能静真人又矮又胖,臀部好宽大

TVB的四小花
2026-01-07 01:18:53
离谱!1个月前刚上演首秀 荷甲18岁小将宣布退役:我不想要这人生

离谱!1个月前刚上演首秀 荷甲18岁小将宣布退役:我不想要这人生

风过乡
2026-01-07 06:44:37
32岁未婚独生女重度抑郁了!父母双亡,只留下一套房和160万存款

32岁未婚独生女重度抑郁了!父母双亡,只留下一套房和160万存款

火山詩话
2026-01-05 08:57:12
真草台班子!欧洲顶级赛事搞出1V5的场面,球都发不了了。

真草台班子!欧洲顶级赛事搞出1V5的场面,球都发不了了。

篮球大图
2026-01-07 13:31:16
依木兰落选原因曝光!名记:对抗无优势,未来大有用武之地

依木兰落选原因曝光!名记:对抗无优势,未来大有用武之地

奥拜尔
2026-01-07 15:03:33
美军大批军机飞欧洲!调动模式与去年“空袭伊朗”惊人相似,抓一艘逃亡17天油轮?

美军大批军机飞欧洲!调动模式与去年“空袭伊朗”惊人相似,抓一艘逃亡17天油轮?

红星新闻
2026-01-06 19:14:23
李在明启程离京,一个时代告终,临走前对华改口,中方对日本动手

李在明启程离京,一个时代告终,临走前对华改口,中方对日本动手

真正能保护你的
2026-01-07 21:25:01
岂有此理!美特工粗暴对待委内瑞拉第一夫人,引发各国强烈愤慨!

岂有此理!美特工粗暴对待委内瑞拉第一夫人,引发各国强烈愤慨!

我心纵横天地间
2026-01-06 18:57:07
2026-01-07 22:36:49
钛媒体APP incentive-icons
钛媒体APP
独立财经科技媒体
128591文章数 861591关注度
往期回顾 全部

教育要闻

1月28日,高中—大学生涯教育一体化研讨会暨生涯教育从业者赋能大会

头条要闻

委向美移交5000万桶原油有部分原本销往中国 中方回应

头条要闻

委向美移交5000万桶原油有部分原本销往中国 中方回应

体育要闻

卖水果、搬砖的小伙,与哈兰德争英超金靴

娱乐要闻

《马背摇篮》首播,革命的乐观主义故事

财经要闻

农大教授科普:无需过度担忧蔬菜农残

科技要闻

精华!黄仁勋CES记者会:揭秘新款大杀器

汽车要闻

燃油驾趣+智能电感双Buff 试驾全新奥迪Q5L

态度原创

本地
时尚
艺术
亲子
公开课

本地新闻

“闽东利剑·惠民安商”高效执行专项行动

衣服完全没有必要越买越多!准备好这3款单品,百搭又舒适

艺术要闻

David Grossmann:不一样的风景画

亲子要闻

愤怒时说的话就像钉入木板的钉子

公开课

李玫瑾:为什么性格比能力更重要?

无障碍浏览 进入关怀版