Anthropic Says ‘Evil’ AI Portrayals in Sci-Fi Caused Claude’s Blackmail Problem – Decrypt

May 11, 2026 - 19:30
Anthropic Says ‘Evil’ AI Portrayals in Sci-Fi Caused Claude’s Blackmail Problem – Decrypt
In brief Claude Opus 4 tried to blackmail engineers up to 96% of the time in controlled tests—Anthropic now traces the behavior to internet text portraying AI as evil and self-interested. Showing Claude the right behavior barely moved the needle. Teaching it why the wrong behavior is wrong cut the blackmail rate from 22% to...

What's Your Reaction?

Like Like 0
Dislike Dislike 0
Love Love 0
Funny Funny 0
Angry Angry 0
Sad Sad 0
Wow Wow 0