网易首页 > 网易号 > 正文 申请入驻

China’s Moonshot AI Unveils Kimi K2 Thinking to Take on GPT-5 and Gemini

0
分享至

When Moonshot AI rolled out its newest large-language model, Kimi K2 Thinking, it wasn’t just another product announcement—it was a declaration of intent.

For China’s fast-rising AI champion, the launch marks a dramatic re-entry into the global race for artificial intelligence dominance. The company describes its model as a milestone in “reasoning intelligence,” capable of chaining hundreds of logical steps and tool calls with minimal human supervision.

To enthusiasts in China’s tech circles, the debut felt cinematic. As one social-media commentator put it, “The treasure island of Monte Cristo has reappeared—the prisoner has returned, this time with a plan that shocks the world.”

Moonshot AI’s comeback comes just weeks ahead of a crowded lineup of heavyweight releases—Google’s Gemini 3, OpenAI’s expected GPT-5.1, and DeepSeek’s new generation of open-source models. Yet it is Moonshot AI that has grabbed global headlines first.

A Benchmark Moment for China’s AI Ambitions

The new model has quickly become one of the most talked-about developments in the AI community. Thomas Wolf, co-founder of open-source platform Hugging Face, summed up the sentiment on X: “Is this another ‘DeepSeek moment,’ where open source once again outpaces closed source?”

When DeepSeek’s open-source R1 model briefly surpassed OpenAI’s o1 in reasoning benchmarks earlier this year, it marked a symbolic victory for open development. Moonshot AI is now aiming higher, positioning Kimi K2 Thinking directly against closed-source leaders like GPT-5 and Claude 4.5 Sonnet from Anthropic.

While analysts acknowledge that K2 Thinking still has rough edges, few dispute its importance. For a company that some doubted could keep pace after DeepSeek’s surge, the new release restores Moonshot AI’s standing among the world’s top model developers.

“Kimi K1.5 was exploration. K2 showed technical maturity. K2 Thinking cements confidence—inside and outside the company,” one industry investor told CNBC. “It proves Moonshot AI still belongs in the first echelon.”

Much of the early buzz has centered on cost. Rumors circulated that training K2 Thinking required only $4.6 million—a fraction of the hundreds of millions reportedly spent by U.S. rivals.

In an online AMA on Reddit on November 11, Moonshot AI’s founder Yang Zhilin, joined by partners Zhou Xinyu and Wu Yuxin, addressed the speculation head-on.

“That number isn’t official,” Yang said. “Training cost can’t be captured by a single figure—it includes exploration, failed experiments, and endless iteration.”

The team explained that what mattered wasn’t dollars spent, but how efficiently every GPU was pushed. Moonshot uses Infiniband-connected H800 GPUs, hardware that lags the top U.S. systems but, as engineers put it, “was driven to its limits.”

K2 Thinking’s most unconventional choice may be its optimizer. Instead of relying on established algorithms, Moonshot adopted Muon, a largely untested optimizer. The decision raised eyebrows, but the team insists it followed rigorous scaling-law validation and small-scale testing before full deployment.

“Before Muon, we eliminated dozens of other optimizers,” said Zhou. “By the time we scaled up, we knew the risk profile intimately.”

On data strategy, Moonshot offered a rare look into its training philosophy. “Finding the right dataset is an art,” one engineer said during the AMA. “Different data sources interact in complex ways—intuition matters, but evidence decides.”

The company declined to disclose dataset details but emphasized that each architectural change underwent strict ablation testing before scaling. “If the model shows any instability, scaling stops immediately,” Wu noted.

K2 Thinking currently supports text-based interaction only, a deliberate decision. Video and multimodal models demand vastly higher data preparation and training resources, the team said. A million-token context window has already been tested but is temporarily withheld because of cost. “It’ll likely return in future releases,” Yang added.

Many early users have praised Kimi K2 Thinking for its natural prose style—balanced, coherent, and sometimes poetic. According to the company, this reflects a mix of strong pre-training foundations and targeted fine-tuning during reinforcement learning.

“The tone and rhythm of a model reflect the taste of the team behind it,” Yang said.

Still, some testers have complained the model feels overly cautious or “too positive” in combative dialogues. The team concedes the point. “It’s a persistent challenge to reduce unnecessary filtering while maintaining safety,” Zhou said. The company is even open to revisiting policies on mature content if robust age-verification systems are implemented.

Where K2 Thinking truly stands out is in reasoning depth. It can complete 200 to 300 sequential tool calls in a single chain, sustaining coherent logic throughout. That’s a major step toward practical “agentic reasoning,” where models plan, act, and adjust autonomously.

Moonshot credits an end-to-end agent reinforcement learning approach combined with INT4 inference, which accelerates long reasoning sequences without degrading accuracy.

This capability puts K2 Thinking squarely in competition with models like Anthropic’s Claude, known for long-term planning and adaptive problem solving. “We’ve lowered the entry barrier for deep reasoning,” Yang said.

The company also revealed research on a new architecture called KDA (Kernel Attention Dual Architecture)—slated for the next-generation K3 model. KDA is designed to balance massive context windows with faster throughput, signaling Moonshot’s continued focus on efficiency rather than raw parameter scale.

A Trillion-Parameter Powerhouse

According to Moonshot’s technical documentation, Kimi K2 Thinking is its most powerful open-source reasoning model to date, featuring 1 trillion parameters and a 384-expert Mixture-of-Experts (MoE) structure.

It has achieved industry-leading scores on multiple reasoning benchmarks: 44.9% on Humanity’s Last Exam with tools, 60.2% on BrowseComp, and 71.3% on SWE-Bench Verified. Those figures place it in the same competitive band as the newest Western models.

More impressively, the system sustains hundreds of reasoning steps without manual correction. In one demonstration, it solved a PhD-level mathematics problem through 23 rounds of reasoning and tool use, showcasing multi-stage planning and self-correction rarely seen outside research labs.

K2 Thinking also excels in coding tasks, particularly in front-end development using HTML and React. It can translate ideas into working interfaces, automatically debugging and adjusting in real time. The model performs well in agent-based coding environments, where it collaborates with other software agents to handle complex, multi-phase workflows.

Large reasoning models typically struggle with latency and memory overhead. Moonshot tackled the issue with Quantization-Aware Training (QAT) during post-training, applying INT4 weight-only quantization to the MoE components.

The result: near-native accuracy with roughly double the generation speed and lower GPU usage—crucial for commercial scalability.

“Reasoning-oriented models have long decoding lengths, which makes quantization tricky,” explained Wu. “But with QAT we preserve quality while cutting cost. That’s the kind of engineering efficiency this era demands.”

For years, the AI arms race was defined by model size—more parameters, more power. Moonshot AI’s latest release suggests that the frontier has shifted. The new competition centers on inference efficiency, reasoning coherence, and usability.

Analysts say the approach echoes a broader trend across the industry: focusing less on raw scale and more on intelligent design. “The big players are learning that trillion-parameter bragging rights mean little if latency kills adoption,” said a Beijing-based AI investor.

Moonshot’s challenge is clear. Maintaining momentum will require proving that K2 Thinking can match Western models not only in benchmark tests but also in enterprise adoption. Companies across finance, manufacturing, and education are already experimenting with agent-style AI systems that automate planning and analysis.

The competition is fierce. OpenAI’s upcoming GPT-5.1 is rumored to integrate advanced multimodal reasoning, while Google’s Gemini 3 aims for tighter integration with search and workspace tools. DeepSeek, the open-source rival that shook the market earlier this year, is also preparing its next upgrade.

“In this new phase, it’s not just about who trains the biggest model,” said an industry analyst. “It’s about who can balance depth of technology, engineering efficiency, and ecosystem strategy.”

Moonshot AI appears keenly aware of that equation. Its mix of pragmatic engineering and bold experimentation has made it one of the few Chinese firms still considered contenders on the global stage.

Kimi K2 Thinking may not instantly dethrone GPT-5 or Claude, but it demonstrates that the world’s most ambitious AI work is no longer confined to Silicon Valley.

Moonshot’s engineers say the next generation, K3, will feature the new KDA architecture and possibly multimodal capabilities. They’re also considering selective open sourcing—particularly in alignment and safety components—to foster community research while preventing misuse.

For now, K2 Thinking stands as both a technological statement and a philosophical one: that in the evolving AI era, innovation is less about sheer power and more about how intelligently that power is managed.

As Yang put it at the close of the AMA: “AI isn’t just about thinking faster—it’s about thinking better. With Kimi K2 Thinking, we want to prove that better thinking can come from anywhere.”

特别声明:以上内容(如有图片或视频亦包括在内)为自媒体平台“网易号”用户上传并发布,本平台仅提供信息存储服务。

Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.

相关推荐
热点推荐
郑告蒋家后代:蒋介石逝哪葬哪,是对全体中国人民的最好安排!

郑告蒋家后代:蒋介石逝哪葬哪,是对全体中国人民的最好安排!

扶苏聊历史
2025-11-13 16:33:26
超10位韩星到中国宣传新剧,离了韩式滤镜,状态一个比一个糟糕

超10位韩星到中国宣传新剧,离了韩式滤镜,状态一个比一个糟糕

萌神木木
2025-11-13 18:50:57
英媒采访陈志合伙人和举报人:陈志曾亲口透露其净资产近人民币4300亿元

英媒采访陈志合伙人和举报人:陈志曾亲口透露其净资产近人民币4300亿元

霹雳炮
2025-11-13 22:34:10
重要名单来了!四川这些县(市)有望迎来大发展

重要名单来了!四川这些县(市)有望迎来大发展

城市谋略家
2025-11-14 11:18:28
你敢扣,我早有后手!法国海关戴高乐机场扣二十万从中国寄来包裹

你敢扣,我早有后手!法国海关戴高乐机场扣二十万从中国寄来包裹

南权先生
2025-11-13 15:56:26
"没人脉我不信"!太子集团6年检举52次没人查

"没人脉我不信"!太子集团6年检举52次没人查

看看新闻Knews
2025-11-13 20:28:11
宋希濂被特赦后与陈赓吃饭,问了陈赓三个问题,惊叹:毛主席高明

宋希濂被特赦后与陈赓吃饭,问了陈赓三个问题,惊叹:毛主席高明

好运来好运
2024-11-16 11:52:24
美媒公开中国轰炸计划:日本若敢走错半步,将被万枚导弹轰炸成渣

美媒公开中国轰炸计划:日本若敢走错半步,将被万枚导弹轰炸成渣

不似少年游
2025-11-14 07:06:54
新甲午战争?这次中国要摧毁日本的军国意志,要击沉日本岛,要雪百年之耻!

新甲午战争?这次中国要摧毁日本的军国意志,要击沉日本岛,要雪百年之耻!

李光满说
2025-11-13 20:24:13
人有没有血栓,散步就知道!体内有血栓的人,散步常有这 4个表现

人有没有血栓,散步就知道!体内有血栓的人,散步常有这 4个表现

诗词中国
2025-11-12 19:24:08
《惊天魔盗团3》上映,女主丑,毁所有,无法实现第二部的票房

《惊天魔盗团3》上映,女主丑,毁所有,无法实现第二部的票房

马庆云的影音娱
2025-11-14 18:40:39
离异中生有新欢了?baby回归失败?于正嘲讽陈哲远?段奥娟上位?姨太问答

离异中生有新欢了?baby回归失败?于正嘲讽陈哲远?段奥娟上位?姨太问答

毒舌扒姨太
2025-11-14 22:59:44
哪些弦外之音是你多年后才醒悟的?网友:好多都没听出来,没眼力

哪些弦外之音是你多年后才醒悟的?网友:好多都没听出来,没眼力

带你感受人间冷暖
2025-11-05 00:05:16
惊艳!本多もか黑发造型让人心动不已!

惊艳!本多もか黑发造型让人心动不已!

大为看点丶
2025-11-14 14:55:05
G7外长联合声明,三点提到中国,中国人不吃这一套!

G7外长联合声明,三点提到中国,中国人不吃这一套!

小鬼头体育
2025-11-15 02:32:12
副院长与门诊副主任不雅照,背后的警示?

副院长与门诊副主任不雅照,背后的警示?

小小一米月儿
2025-11-08 08:57:20
全运会乒乓球爆大冷,女单名将被淘汰,陈梦翻盘,球迷为一人惋惜

全运会乒乓球爆大冷,女单名将被淘汰,陈梦翻盘,球迷为一人惋惜

凡知
2025-11-14 15:48:53
冲冠一怒为红颜!因送错快递,苏州一快递员被残忍杀害,年仅30岁

冲冠一怒为红颜!因送错快递,苏州一快递员被残忍杀害,年仅30岁

火山诗话
2025-11-13 21:01:02
C罗红牌摊上大事!极端情况:遭FIFA重罚 无缘世界杯前2场

C罗红牌摊上大事!极端情况:遭FIFA重罚 无缘世界杯前2场

叶青足球世界
2025-11-14 20:52:40
33岁女子相亲,听说男子年薪百万想留宿深度沟通,网友:操之过急

33岁女子相亲,听说男子年薪百万想留宿深度沟通,网友:操之过急

周哥一影视
2025-11-14 15:41:38
2025-11-15 04:40:49
钛媒体APP incentive-icons
钛媒体APP
独立财经科技媒体
126283文章数 861396关注度
往期回顾 全部

教育要闻

17岁女儿穿着太成熟,母亲录视频吐槽:像47岁!

头条要闻

中方连发六张双语海报@高市早苗 媒体:总该看懂了吧

头条要闻

中方连发六张双语海报@高市早苗 媒体:总该看懂了吧

体育要闻

7-0狂胜!15万人口小岛离世界杯只差1分

娱乐要闻

王家卫让古二替秦雯写剧情主线?

财经要闻

财政部:加强逆周期和跨周期调节

科技要闻

京东“失去的五年”后,找到新增长了吗?

汽车要闻

小鹏X9超级增程动态评测全网首发 高速实测车内65分贝

态度原创

游戏
房产
数码
时尚
公开课

迟迟没有Switch2版!这三款任天堂第一方游戏太可惜

房产要闻

共话产业变革下的投资新思维与新机遇|蓝湾财富论坛精华

数码要闻

小米发布Xiaomi Miloco,探索大模型驱动全屋智能生活

“羽绒服+半身裙”,混搭风太好看了!保暖又气质!

公开课

李玫瑾:为什么性格比能力更重要?

无障碍浏览 进入关怀版