Correcting deceptive strategies in AI models, once learned, might reinforce them, Anthropic study analysis shows.
Safety Measures
Robust Intelligence, a startup dedicated to developing AI model security to protect them from attacks. It discovers "jail break" prompts.