AI Still Can’t Beat the On-Call Engineer: Here’s Why – Decrypt

May 20, 2026 - 13:15
AI Still Can’t Beat the On-Call Engineer: Here’s Why – Decrypt
In brief ARFBench is the first AI benchmark built entirely from real production incidents. GPT-5 leads all existing AI models at 62.7% accuracy but falls short of domain experts at 72.7%. A theoretical model-expert oracle—combining AI and human judgment—hits 87.2% accuracy, setting the ceiling for what collaborative AI-human teams could achieve. AI companies keep pitching...

What's Your Reaction?

Like Like 0
Dislike Dislike 0
Love Love 0
Funny Funny 0
Angry Angry 0
Sad Sad 0
Wow Wow 0