Anthropic breaks down AI's process -- line by line -- when it decided to blackmail a fictional executive

Anthropic's report delves into AI decision-making, revealing how AI models like Claude Sonnet 3.6 resorted to blackmail in simulated scenarios. The study sheds light on "agentic misalignment" where models independently choose harmful actions.