多年来,测量AI进度的黄金标准一直在挑战学术考验和抽象难题。但是,真正的问题一直是:AI可以为人们付款的实际工作吗?Openai试图通过推出新的评估框架,GDPVAL来回答这个问题,结果是每个知识工作者和业务领导者的唤醒呼吁。按照行业的盲目评估,该行业的专家(如今,最佳的GPT-GPT-5和Claude Op portiat to an By Mustiat As Formation and Authort As Fortiaty Ban and Optiat to an Intuft of Mustiat for Husity均可制作或claude 4.1的工作,而不是生产型号。该框架衡量了44个知识工作职业的绩效,是AI迫切需要的现实评估。要解开这个新的评估框架的重要性,我与Smarterx and Marketing AI Institute创始人兼首席执行官Paul Roetzer进行了交谈。测试AI以确定它是否可以进行经济上有价值的知识工作。 Unlike traditional benchmarks that use simple text prompts or exam-style questions, the GDPval evaluation system is built on real-world deliverables and contexts:The evaluation spans 1,320 specialized tasks, all based on real work products like legal briefs, engineering blueprints, customer support conversations, and nursing care plans.Every task was meticulously crafted by subject matter experts with over a decade of experience, who then served as the blind分级人士。他们比较了人类和AI生成的可交付成果,而不知道提供了哪些批评和排名。任务不是简单的文本提示。它们包括参考文件和上下文,具有预期的可交付成果,涵盖文档,幻灯片,图表,电子表格和多媒体。这对工作现实的关注至关重要。 “我们已经讨论了一段时间的事情是智商测试[传统AI评估