25.9 C
Usa River
Friday, January 24, 2025

Even some of the best AI can’t beat this new benchmark

Must read

Advertisements


The nonprofit Center for AI Safety (CAIS) and Scale AI, a company that provides a number of data labeling and AI development services, have released a challenging new benchmark for frontier AI systems.

The benchmark, called Humanity’s Last Exam, includes thousands of crowdsourced questions touching on subjects like mathematics, humanities, and the natural sciences. To make the evaluation tougher, the questions are in multiple formats, including formats that incorporate diagrams and images.

In a preliminary study, not a single publicly available flagship AI system managed to score better than 10% on Humanity’s Last Exam.

CAIS and Scale AI say they plan to open up the benchmark to the research community so that researchers can “dig deeper into the variations” and evaluate new AI models.



Source link

More articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Advertisements

Latest article