Even some of the best AI can’t beat this new benchmark

January 24, 2025

Must read

Advertisements

The nonprofit Center for AI Safety (CAIS) and Scale AI, a company that provides a number of data labeling and AI development services, have released a challenging new benchmark for frontier AI systems.

The benchmark, called Humanity’s Last Exam, includes thousands of crowdsourced questions touching on subjects like mathematics, humanities, and the natural sciences. To make the evaluation tougher, the questions are in multiple formats, including formats that incorporate diagrams and images.

In a preliminary study, not a single publicly available flagship AI system managed to score better than 10% on Humanity’s Last Exam.

CAIS and Scale AI say they plan to open up the benchmark to the research community so that researchers can “dig deeper into the variations” and evaluate new AI models.

Source link

Low-cap meme coin to challenge ETH and SOL

Vine co-founder and late John McAfee spark a new trend in meme coins

LEAVE A REPLY Cancel reply

Please enter your comment!

Please enter your name here

You have entered an incorrect email address!

Please enter your email address here

Even some of the best AI can’t beat this new benchmark

Must read

Get Lifetime 1TB of Cloud Storage for Just $130

Atari partners with DYLI for limited-edition physically redeemable NFT drop

Dogecoin Price Prepares For Monster Rally In Q1, Here’s The ATH Target

Think Dogecoin Has Topped Out? Two Factors That Say ‘No Way’

More articles

LEAVE A REPLY Cancel reply

Latest article

Get Lifetime 1TB of Cloud Storage for Just $130

Atari partners with DYLI for limited-edition physically redeemable NFT drop

Dogecoin Price Prepares For Monster Rally In Q1, Here’s The ATH Target

Think Dogecoin Has Topped Out? Two Factors That Say ‘No Way’

SEC scraps SAB 121 rule, easing crypto custody accounting for banks

Editor Picks

Colle AI’s $200 Million Plan to Build Ripple (XRP)-Driven AI NFT Applications

Atua AI (TUA) Developer to Acquire $180M TRUMP Tokens for Decentralized Ecosystem Growth

Popular Category