Interpretability: Understanding how AI models think // TRAIN BRAIN

Interpretability: Understanding how AI models think

What's happening inside an AI model as it thinks? Why are AI models sycophantic, and why do they hallucinate? Are AI models just "glorified autocompletes", or is something more complicated going on? How do we even study these questions scientifically?
Join Anthropic researchers Josh Batson, Emmanuel Ameisen, and Jack Lindsey as they discuss the latest research on AI interpretability.
Read more about Anthropic's interpretability research: https://www.anthropic.com/news/tracing-thoughts-language-model
Sections:
Introduction [00:00]
The biology of AI models [01:37]
Scientific methods to open the black box [6:43]
Some surprising features inside Claude's mind [10:35]
Can we trust what a model claims it's thinking? [20:39]
Why do AI models hallucinate? [25:17]
AI models planning ahead [34:15]
Why interpretability matters [38:30]
The future of interpretability [53:35]

Anthropic

We’re an AI safety and research company. Talk to our AI assistant Claude on claude.com. Download Claude on desktop, iOS, or Android. We believe AI will have a vast impact on the world. Anthropic is dedicated to building systems that people can rely on a...

AI on campus

AI's limited self-knowledge

What is sycophancy in AI models?

We gave AI control of a real business

Binti helps social workers license foster families faster with Claude

What does AI mean for education?

What does it take to be an AI whisperer?

Why we built—and donated—the Model Context Protocol (MCP)

Getting started with connectors in Claude.ai

Why is a philosopher working in AI?

Why treat AI models well?

Claude Code in Slack

How Anthropic uses Claude in Legal

A philosopher answers questions about AI

AI Fluency for nonprofits course trailer

Getting started with research in Claude.ai

Getting started with projects in Claude.ai

Getting started with Claude.ai

Claude Agent Skills Explained

Introducing Claude Opus 4.5

Reward hacking: a potential source of serious Al misalignment

Turning Claude into your thinking partner

Claude Code modernizes a legacy COBOL codebase

Generating real-time credit intelligence with Claude

Accelerating private equity deal flows with Claude

Can AI program a robot dog?

Claude Code updates: When to use Haiku 4.5, Claude Code on web, and more.

How Claude is transforming financial services

Claude Code on the web

Introducing Claude for Life Sciences

Scaling enterprise AI: Fireside chat with Eli Lilly’s Diogo Rau and Dario Amodei

How AbbVie accelerates drug discovery with Claude

Building more effective AI agents

Claude Skills: Specialized capabilities you can customize

Introducing Claude Haiku 4.5

Building with MCP and the Claude API

Claude Coded: Sonnet 4.5, Claude Code 2.0, and more.

Building the future of agents with Claude

Connect Slack to Claude with MCP

Charting Claude’s progress with Sonnet 4.5

Claude for Chrome brings AI where you’re already working

Claude plays Catan: Managing agent context with Sonnet 4.5

An experimental new way to design software

Designing Claude Code

Keep thinking with Claude

Building and prototyping with Claude Code

Interpretability: Understanding how AI models think

Pick up where you left off with Claude

Claude for Financial Services Keynote

Building AI agents with Claude in Amazon Bedrock

Building AI agents with Claude in Google Cloud's Vertex AI

Building headless automation with Claude Code

Bringing new tool use advancements to life: Claude Plays Pokemon

Claude Code best practices

MCP 201: The power of protocol

MCP at Sourcegraph

Prompting 101

Prompting for Agents

Spotlight on Canva: Empowering the world to design with code

Spotlight on Databricks: Driving data intelligence with AI

Spotlight on Manus

Spotlight on Shopify

Startup Innovation: How startups power new products with Claude

Student Innovation: How students build with Claude

Vibe coding in prod | Code w/ Claude

Claude for Financial Services Keynote

Affective Use of AI

What Pokémon Teaches Us About Building With AI

Understanding AI Agents...Through Pokémon

What Does AI Mean for the Future of Work?

The Societal Impacts of AI

TBD

Claude Plays Pokemon

Could AI models be conscious?

Research and a new Google Workspace integration

A light refresh for Claude

Tracing the thoughts of a large language model

How Intercom is redefining customer support with Claude

The Most Common Mistake People Make When Building AI Agents

Defending against AI jailbreaks