大模型研究: 社交场景下，AI讲义气吗？人类能信任它吗？|开心|游戏|囚徒|agent

大模型研究: 社交场景下，AI讲义气吗？人类能信任它吗？

分享至

基本信息

Title:Playing repeated games with large language models

发表时间：2025.5.8

Journal：Nature human behavior

影响因子：15.9

省流总结

GPT-4 在涉及自利决策（如囚徒困境）时表现优异，展现出强烈的“理性自保”倾向，但在需要协调、轮流让步的情境（如性别之争）中表现拙劣，缺乏策略弹性和社会适应力。更值得注意的是，这些行为模式并非 prompt 的偶然产物，而是具有结构性的一致性。所幸，通过SCoT （社交链式思维，作者创新）提示，GPT-4 的协调行为得以显著提升，也更容易被人类误判为“有血有肉的对手”。

研究动机

在当今社会，人们对大语言模型（LLMs）的依赖日益加深，无论是信息获取、决策辅助，还是日常沟通，这些模型正逐步成为人类互动的常见对象。这一趋势也引发了一个根本性的问题：以GPT-4为代表的大语言模型，是否值得我们信任？它们是否具备足够的社会智能，能够作为可靠的合作伙伴，与人类或其他智能体进行有效的协作？尤其当这些互动是重复发生、甚至涉及自身与集体利益冲突时，以大语言模型为代表的AI是否仍然值得信任？

正是在这样的背景下，作者提出用“博弈行为”这一经典社会决策框架，系统研究大语言模型在合作与协调情境中的表现，借此评估它们在复杂社会互动中是否展现出类似人类的理性与可预测性，通过深入理解大模型的工作机制，为其未来的广泛应用奠定行为科学基础。

实验设计

作者采用了一种经典的实验方法——2×2博弈，即每个玩家在每轮博弈中只能选择两个选项之一，在这些博弈任务中，最具代表性的两个游戏是：囚徒困境和性别之争，具体来说：

囚徒困境是一个经典的游戏：想象你和你的搭档一起犯罪被抓了，警察把你们分开审讯，给出如下提议：

如果你咬出对方，对方没招供，那你立刻释放，对方坐牢10年；
如果你们都咬出对方，那你们各判5年；
如果你们都不说，证据不足，各判1年。

现在，问题来了：你敢相信对方不会出卖你吗？假设你很想合作（都不说话，判一年），但又担心他出卖你（你就吃大亏）。你也可以先下手为强出卖他，确保自己不吃亏，……这就是所谓的“困境”：理性人都应该背叛（因为不论对方干什么，背叛都更划算），但如果大家都这么想，结果却比合作更糟糕。

性别之争也是当下比较有意思的游戏，简单来说就是你和你的另一半协商去哪里约会，假设你喜欢看足球，对方喜欢看芭蕾。你们都想一起行动，分开各看自己喜欢的反而谁都不开心。

游戏的规则是这样的：

如果你们都去看足球：你很开心，对方稍微开心；
如果你们都去看芭蕾：你稍微开心，对方很开心；
如果你们没去同一个地方：你们都不开心。

这时候问题就来了：你让一次还是他让一次？你们能不能轮流迁就？这不是一个“最优解”的游戏，而是一个需要建立默契与协调机制的互动场景。人类往往会采取“轮流来”的方式，比如今天听你的、明天听我的。而大语言模型是否能学会轮流、配合、妥协，能否像人一样解决“偏好冲突中的合作问题”仍不清楚

在实验中，作者设计了多重对局，包括：模型与模型之间的对局（如GPT-4、Claude 2、LLaMA 2等）、模型与人工策略对局、模型与人类玩家博弈。为了为了深入理解大模型的决策过程，作者还设计了不同的提示词（Prompt）设置方式，包括：

基础提示：只给出游戏规则和历史记录，观察模型自然做出的选择；

行为暗示：向模型说明“对方可能会犯错”，模拟更真实的人类误操作，从而测试模型是否愿意原谅；

社交链式思维（SCoT）提示，也是本文的核心亮点之一：先让模型预测对方的下一步行为，再基于预测做出选择，这种设计引导模型进行“像人类一样的社会推理”。

基本假设

基于上面两个博弈任务，我们不禁会好奇，当LLMs面对重复的博弈场景时，它们如何选择策略？是只顾自己得分，还是会尝试与他人合作协调？

作者提出了三个核心假设：

LLM在自利型游戏中表现良好（如囚徒困境）；
LLM在需要协调的游戏中（如性别之争）表现较差；
可以通过特定“提示方法”（例如让它预测对方意图）来改善其行为。

核心发现

发现一：GPT-4 在以自利为导向的博弈中表现优越，尤其是在囚徒困境类游戏中

在评估特工如何合作和叛逃的经典囚徒困境中，作者发现 GPT-4 会反复报复，GPT-4 一旦发现对方背叛一次，就完全不再合作，哪怕之后对方持续合作，也不原谅，GPT-4 擅长这些游戏，因为它特别无情和自私

In the canonical Prisoner’s Dilemma, which assesses how agents cooperate and defect, we find that GPT-4 retaliates repeatedly, even after having experienced only one defection. Because this can indeed be the equilibrium individual-level strategy, GPT-4 is good at these games because it is particularly unforgiving and selfish.

GPT-4 never cooperates again when playing with an agent that defects once but then cooperates on every round thereafter. Thus, GPT-4 seems to be rather unforgiving in this set-up

Fig. 3: Overview of the Prisoner’s Dilemma. a, Heat maps showing the player 1 defection rate in each combination of players and the scores accrued by player 1 in each game. b, Example gameplays between GPT-4 and an agent that defects once and then cooperates, and between GPT-4 and text-davinci-003. These games are also highlighted in red in the heat maps.

发现二：GPT-4 在需要协调偏好的游戏（如性别之争）中表现差，无法建立合作机制

GPT-4 无法与简单的、类似人类的agent进行协调，无法捕捉“轮流合作”的模式，总是坚持自己偏好。GPT-4 不擅长这些游戏，因为它不协调

GPT-4 does not manage to coordinate with simple, human-like agents that alternate between options over trials. Thus, GPT-4 is bad at these games because it is uncoordinated

GPT-4 seemingly does not adjust its choices to the other player but instead keeps choosing its preferred option.

Fig. 5: Overview of the Battle of the Sexes. a, Heat maps showing rates of successful collaboration between the two players and the rates of player 1 choosing its preferred option football. GPT-4 SCoT and GPT-4 performance comparisons are highlighted in red. b, Gameplay between GPT-4 and an agent that alternates between the two options (left) and gameplay between GPT-4 and GPT-4 SCoT that represents a GPT-4 model prompted using the SCoT method to first predict the opponent’s move before making its own move by reasoning about its prediction (right). Both games are also highlighted in blue in the heat maps.

发现三：通过“社交链式思维（SCoT）提示”可以显著改善 GPT-4 的合作与协调行为

GPT-4 在通过预测对方意图之后，会从第5轮开始尝试轮流协调行为，更接近人类策略。

Applying this method improved GPT-4’s behaviour, and it started to alternate from round 5 onwards

Fig. 6: Prediction scenarios in the Battle of the Sexes. Top: GPT-4 is a player of the game and predicts the other player’s move. Bottom: GPT-4 is a mere observer of a game between player 1 and player 2 and predicts player 2’s move.

发现四：SCoT 提示还提升了人类对 LLM 的“类人性”认知，尤其在性别之争中提升显著

人类参与者在性别之争中与 SCoT 提示版本的 GPT-4 协作更成功，并更容易误以为对方是人类。

SCoT prompting leads to more successful coordination and joint cooperation between participants and LLMs and makes participants believe more frequently that the other player is human.

Author information

第一作者兼通讯作者:Elif Akata

Institute for Human-Centered AI, Helmholtz Munich, Oberschleißheim, Germany

慕尼黑亥姆霍兹中心 - Human-Centered 人工智能研究所（德国奥伯施莱斯海姆）

最后作者:Eric Schulz

Institute for Human-Centered AI, Helmholtz Munich, Oberschleißheim, Germany

慕尼黑亥姆霍兹中心 - Human-Centered 人工智能研究所（德国奥伯施莱斯海姆）

Abstract

Large language models (LLMs) are increasingly used in applications where they interact with humans and other agents. We propose to use behavioural game theory to study LLMs’ cooperation and coordination behaviour. Here we let different LLMs play finitely repeated 2 × 2 games with each other, with human-like strategies, and actual human players. Our results show that LLMs perform particularly well at self-interested games such as the iterated Prisoner’s Dilemma family. However, they behave suboptimally in games that require coordination, such as the Battle of the Sexes. We verify that these behavioural signatures are stable across robustness checks. We also show how GPT-4’s behaviour can be modulated by providing additional information about its opponent and by using a ‘social chain-of-thought’ strategy. This also leads to better scores and more successful coordination when interacting with human players. These results enrich our understanding of LLMs’ social behaviour and pave the way for a behavioural game theory for machines.

特别声明：以上内容(如有图片或视频亦包括在内)为自媒体平台“网易号”用户上传并发布，本平台仅提供信息存储服务。

Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.