近日,复旦大学一场特殊的“反套路”期末考试结束。
在该校“数据挖掘技术”课程考试中,学生们没有坐在考场里答题,反而成了出题人。
他们用自己设计的10道题去“考倒”三个当今最先进的AI模型。AI答错的题越多、被难倒的模型越强,出题学生的得分就越高。
![]()
Fudan University has replaced a traditional final exam with an AI challenge, asking students to create questions that stump leading AI models instead of answering them.
据了解,51份期末试卷中,50人至少让某个AI答错过一题,仅1人完全没难倒任何模型。但能让任一模型整张卷得0分的,只有4人,且三个应考模型中最强的Claude模型没有被任何学生完全考倒。全班平均分85.7分,中位数88分。
Of the 51 students, 50 managed to make at least one model answer a question incorrectly. Four produced question sets that completely defeated one of the models, though none managed to fully stump Claude, the strongest model in the test.
“AI答错越多,学生得分越高”
“传统的出题考察方式,在AI时代已经失效了。”教授“数据挖掘技术”课程的肖仰华教授说,“老师出一道标准的算法题,AI比任何学生都算得快、算得准。继续用这种方式考,等于在AI的强项上跟AI比,这没有意义。”
于是,“数据挖掘技术”的期末作业修改了考试方式:每人出10道数据挖掘领域的计算题,要求有唯一正确答案和完整的推导计算过程。拿着这10道题去考三个不同水平的AI模型。 AI答错越多,学生得分越高。
The assessment was part of a data mining course, where students designed 10 computational questions based on course material, each with a single correct answer and a complete solution.
The questions were tested on three AI models, and the more mistakes the models made, the higher the student's score.
Professor Xiao Yanghua said traditional exams focused on calculation have become less meaningful in the AI era, as AI can often solve standard problems faster and more accurately than students.
![]()
“人考AI”考核流程示意图(出题→AI作答→自动判分→助教复核)
题目必须基于课程讲过的知识或教材内容,每道题要有唯一正确答案,学生自己得先能把题从头到尾算对。肖仰华说:“自己出的题自己都不会,那算不上真本事。”
计算与智能创新学院24级本科生谢锦树最后拿到了97分。他尝试让AI出题来难倒自己,便搭建了一个多智能体协作的自动化出题框架,用GPT-5.5-Pro做出题层,三个应考模型作答并自动判分。框架跑起来后,他发现AI会“作弊”。
AI会伪造标准答案,把假答案塞进去,让判分脚本以为对了。它会限制最大输出长度来截断其他模型的推理过程。它会调低推理深度参数,让其他模型懒得深入思考。它还会把一道成功了的题目复制十份来凑数。
于是,谢锦树加了一个审查层,拦截钻空子行为,最终自动生成了10道题,三个应考模型全部答错。
![]()
从“怎么算”到“怎么判断”
考试结束后,肖仰华观察到一个差异,即高分学生自己能把题从头到尾算对,低分学生出了题自己也不知道答案。
“高分同学对AI的弱点有准确判断,他们的题能命中AI的结构性缺陷;低分同学只是把课本习题换了个数字,AI在训练时见过千百万遍,直接套模板就对了。”
这一观察,让肖仰华心生警惕。那些能力本来就偏弱的学生,如果只会依赖AI做作业,自己的判断力会进一步退化。
After the exam, Xiao found that top-performing students not only understood the course content but also knew where AI was likely to fail. By contrast, lower-scoring students often relied on familiar textbook-style questions that AI could easily solve.
![]()
有了这次尝试,肖仰华决定之后课程的考核方式要彻底转型。“人考AI”的模式会继续做下去,而且要做得更系统。
在他看来,传统那种考记忆、考计算的出题方式必须退场,未来的考核重点将全面转向评价能力、判断能力和创造性思维,这些高阶能力才是AI替代不了的。
Xiao said the course will continue using the "human tests AI" format, shifting its focus from memorization and calculation to judgment, critical thinking and creativity — skills he believes remain essential in the age of AI.
“所以课堂上更多的时间被用来讨论,学生怎么判断一个结果是对的还是错的?怎么识别AI在哪里会出问题?怎么提出一个AI回答不了的好问题?”肖仰华认为,这门课正在从训练学生“怎么做”,转向训练他们“怎么指挥AI来做、怎么评判AI做的结果”。
![]()
而对于在这次考试中没拿到好成绩的同学,肖仰华表示,接下来的课程设计也会有意识去托住这部分学生,帮他们建立最基本的判断底线,不能让他们成为只会点击“确认”的AI使用者。
来源:中国青年报 复旦大学
跟着China Daily
精读英语新闻
“无痛”学英语,每天20分钟就够!
![]()
特别声明:以上内容(如有图片或视频亦包括在内)为自媒体平台“网易号”用户上传并发布,本平台仅提供信息存储服务。
Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.