ThinkingNews

Back to feed
Hacker News May 2, 2026

Refusal in Language Models Is Mediated by a Single Direction