Huawei’s New Benchmark Gives AI Agents Months of Your Life—Then Watches Them Fail – Decrypt

May 27, 2026 - 18:15

Huawei’s New Benchmark Gives AI Agents Months of Your Life—Then Watches Them Fail – Decrypt

In brief Researchers from Huawei and three partner institutions released Claw-Anything, a benchmark that evaluates AI agents on personal-assistant tasks. GPT-5.5, OpenAI’s flagship model, scored only 34.5% on the pass@1 metric—far below its scores on existing benchmarks, suggesting current tests are measuring the wrong things. The team also released an automated data pipeline that produced...

What's Your Reaction?

Like 0

Dislike 0

Love 0

Funny 0

Angry 0

Sad 0

Wow 0

Related Posts

Michael Saylor Touts $48 Billion Bitcoin Turnaround, But Can MicroStrategy’s STRC Survive 2026?

Michael Saylor Touts $48 Billion Bitcoin Turnaround, Bu...

Jun 20, 2026

Brazil Sees $318B In Crypto Inflows As On-Chain Money Launde

Brazil Sees $318B In Crypto Inflows As On-Chain Money L...

Jun 20, 2026

Second-generation iPhone Air: cameras, specs, 2027 launch

Second-generation iPhone Air: cameras, specs, 2027 launch

Jun 20, 2026

BOJ deputy warns on inflation as Polymarket puts 2026 Fed hike odds at 66%

BOJ deputy warns on inflation as Polymarket puts 2026 F...

Jun 20, 2026

CFTC And SEC Seek Input On Derivatives Definitions As Crypto Perpetuals Face Legal Test

CFTC And SEC Seek Input On Derivatives Definitions As C...

Jun 20, 2026

Ripple Price Analysis: Where XRP Could Go Next After Its Weekly Rejection

Ripple Price Analysis: Where XRP Could Go Next After It...

Jun 20, 2026