Interpretability: Understanding how AI models think
What's happening inside an AI model as it thinks? Why are AI models sycophantic, and why do they hallucinate? Are AI models just "glorified autocompletes", or is something more complicated going on? How do we even study these questions scientifically?
Join Anthropic researchers Josh Batson, Emmanuel Ameisen, and Jack Lindsey as they discuss the latest research on AI interpretability.
Read more about Anthropic's interpretability research: https://www.anthropic.com/news/tracing-thoughts-language-model
Sections:
Introduction [00:00]
The biology of AI models [01:37]
Scientific methods to open the black box [6:43]
Some surprising features inside Claude's mind [10:35]
Can we trust what a model claims it's thinking? [20:39]
Why do AI models hallucinate? [25:17]
AI models planning ahead [34:15]
Why interpretability matters [38:30]
The future of interpretability [53:35]
Anthropic
We’re an AI safety and research company. Talk to our AI assistant Claude on claude.com. Download Claude on desktop, iOS, or Android. We believe AI will have a vast impact on the world. Anthropic is dedicated to building systems that people can rely on a...