Huawei’s New Benchmark Gives AI Agents Months of Your Life—Then Watches Them Fail – Decrypt

May 27, 2026 - 18:15
Huawei’s New Benchmark Gives AI Agents Months of Your Life—Then Watches Them Fail – Decrypt
In brief Researchers from Huawei and three partner institutions released Claw-Anything, a benchmark that evaluates AI agents on personal-assistant tasks. GPT-5.5, OpenAI’s flagship model, scored only 34.5% on the pass@1 metric—far below its scores on existing benchmarks, suggesting current tests are measuring the wrong things. The team also released an automated data pipeline that produced...

What's Your Reaction?

Like Like 0
Dislike Dislike 0
Love Love 0
Funny Funny 0
Angry Angry 0
Sad Sad 0
Wow Wow 0