To push the limits of AI, experts are calling for submissions to create the “hardest and broadest set of questions ever” to challenge today’s most advanced AI systems.
The test, named “Humanity’s Last Exam,” is the joint effort of CAIS and Scale AI. Notably, Scale AI recently raised $1 billion, pushing its overall valuation to $14 billion. Dan Hendrycks, the executive director of CAIS, emphasized the importance of this new test. Referring to OpenAI’s recent o1 model, he remarked that it had “destroyed the most popular reasoning benchmarks.” Given this level of performance, the new exam must raise the bar significantly.
Hendrycks, who co-authored papers on AI testing in 2021, noted how much the field has advanced since then. “The models of today have ‘crushed’ the 2021 tests,” he stated, highlighting the leap from earlier AI systems that gave seemingly random answers.
Unlike the earlier tests, which focused on areas like math and social studies, “Humanity’s Last Exam” will incorporate abstract reasoning, making it much more challenging for AI systems. The organizers have also taken extra precautions to ensure the integrity of the test by keeping the criteria confidential, preventing any potential answers from being incorporated into AI training data.
The call for submissions is open to experts from various fields, including rocketry and philosophy. Participants are encouraged to submit questions that would be difficult for non-experts to answer, with the submission deadline set for November 1. Winning questions will undergo peer review, and successful contributors could earn co-authorship on a related research paper as well as prizes of up to $5,000, courtesy of Scale AI.
While the range of acceptable questions is vast, organizers have made it clear that one topic will not be allowed: weapons. As they told Reuters, “it’s too dangerous for AI to know about.”