Atomic Facts to Structured Knowledge: Rethinking Unlearning & Jailbreaking in Large Language Models // TRAIN BRAIN

Atomic Facts to Structured Knowledge: Rethinking Unlearning & Jailbreaking in Large Language Models

A Google TechTalk, 2026-02-11, presented by Rongzhe Wei
ABSTRACT: Large language models are increasingly deployed in high-impact settings, making trust and safety central concerns. A growing body of evidence suggests that many failures in these systems share a common root cause: knowledge in LLMs is not stored as isolated atomic facts, but as structured and interdependent internal representations. This talk argues for a shift from atomic views of model knowledge toward structured internal knowledge modeling, and shows how this perspective fundamentally reshapes our understanding of both unlearning and jailbreaking. On the trust side, by modeling an LLM’s internal correlated knowledge as a structured representation, we reveal why existing unlearning methods often achieve only superficial forgetting: even when a target fact is suppressed, it frequently remains inferable through correlated internal knowledge. We present the first graph-based evaluation framework that exposes severe overestimation of unlearning effectiveness in previous evaluations. On the safety side, from the same perspective, we show that most existing red-teaming and jailbreaking methods remain confined to a prompt-optimization paradigm that implicitly targets atomic facts, a strategy that increasingly fails against modern commercial LLMs. In contrast, we introduce a new attack paradigm that explores and weaves together benign knowledge fragments within the model’s internal structure, achieving over 95% success against state-of-the-art aligned models. Together, these results highlight a shared structural vulnerability underlying both unlearning failures and jailbreak robustness. Rethinking LLMs through the lens of structured internal knowledge offers a unifying framework for evaluating, attacking, and ultimately defending modern language models.

Google TechTalks

Google Tech Talks is a grass-roots program at Google for sharing information of interest to the technical community. At its best, it's part of an ongoing discussion about our world featuring top experts in diverse fields. Presentations range from the br...

Machine Text Detectors are Membership Inference Attacks

Differentially Private Table-Image Multimodal Data Generation

Atomic Facts to Structured Knowledge: Rethinking Unlearning & Jailbreaking in Large Language Models

Understanding Trade-offs Between Worst-case Differential Privacy Guarantees & Real Threat Models

Local Node Differential Privacy

Privacy Amplification for Correlated-Noise Mechanisms via b-Min-Sep Subsampling

Streaming Attention Approximation via Discrepancy Theory

Is Learning Effective in Dynamic Strategic Interactions? Evidence from Stackelberg Games

Algorithmic Contract Design

Online Learning and Economics

Go Meetup April 2025 - i18n Go Experiment

Go Meetup April 2025 - Whats New in Go 1.24?

Go Meetup April 2025 - Git Bisect and Go Size Analyzer

Go Meetup April 2025 - Photobooth

Go Meetup April 2025 - Go Protobuf

Understanding LLMs Like Physicists: Observation, Hypothesis, Experimentation, and Prediction

Theoretical Limitations of Multi layer Transformers

AI Snake Oil

How I Wrote 10K Lines of Go in a Weekend

Supply Chain Security with Go

A Multi Dimensional Online Contention Resolution Scheme

Robust Distortion-free Watermarks for Language Models

Is it possible to make self-adjusting data structures concurrent?

Privacy Preserving ML with Fully Homomorphic Encryption

The Chinese Computer: A Global History of the Information Age

KAN: Kolmogorov-Arnold Networks

Learning through Transient Matching in Congested Markets

What Makes Software Work?

Algorithms and Hardness for Attention and Kernel Density Estimation

A Unified Analysis of Label Inference Attacks

Copyright Regenerated: Harnessing GenAI to Measure Originality and Copyright Scope

The Data Minimization Principle in Machine Learning

Challenges in Augmenting Large Language Models with Private Data

Greybeard Qualification (Linux Internals) part 1: Process Structure and IPC