Streaming Attention Approximation via Discrepancy Theory // TRAIN BRAIN

Streaming Attention Approximation via Discrepancy Theory

A Google TechTalk, presented by Ekaterina Kochetkova, 2025-10-23
ABSTRACT: The memory requirements of LLM inference grow rapidly with the context length due to the demands of attention computation. We present BalanceKV - an algorithm that leverages the geometric properties of the key-value cache to compress it without significantly affecting the quality of attention computation. BalanceKV has strong theoretical guarantees grounded in discrepancy theory and demonstrates empirically validated performance improvements over existing methods. The full paper is available at arXiv:2502.07861.
About the Speaker: Ekaterina Kochetkova is a third year CS PhD student at EPFL working with Michael Kapralov. She is broadly interested in applying theoretical insights to develop efficient algorithms for large-scale machine learning. Her recent work focuses on optimizing the memory/runtime of LLM inference and on sublinear graph clustering methods that utilize learned vertex features. More information is available at https://ekaterina-kochetkova.github.io/e_kochetkova.github.io/.

Google TechTalks

Google Tech Talks is a grass-roots program at Google for sharing information of interest to the technical community. At its best, it's part of an ongoing discussion about our world featuring top experts in diverse fields. Presentations range from the br...

Machine Text Detectors are Membership Inference Attacks

Differentially Private Table-Image Multimodal Data Generation

Atomic Facts to Structured Knowledge: Rethinking Unlearning & Jailbreaking in Large Language Models

Understanding Trade-offs Between Worst-case Differential Privacy Guarantees & Real Threat Models

Local Node Differential Privacy

Privacy Amplification for Correlated-Noise Mechanisms via b-Min-Sep Subsampling

Streaming Attention Approximation via Discrepancy Theory

Is Learning Effective in Dynamic Strategic Interactions? Evidence from Stackelberg Games

Algorithmic Contract Design

Online Learning and Economics

Go Meetup April 2025 - i18n Go Experiment

Go Meetup April 2025 - Whats New in Go 1.24?

Go Meetup April 2025 - Git Bisect and Go Size Analyzer

Go Meetup April 2025 - Photobooth

Go Meetup April 2025 - Go Protobuf

Understanding LLMs Like Physicists: Observation, Hypothesis, Experimentation, and Prediction

Theoretical Limitations of Multi layer Transformers

AI Snake Oil

How I Wrote 10K Lines of Go in a Weekend

Supply Chain Security with Go

A Multi Dimensional Online Contention Resolution Scheme

Robust Distortion-free Watermarks for Language Models

Is it possible to make self-adjusting data structures concurrent?

Privacy Preserving ML with Fully Homomorphic Encryption

The Chinese Computer: A Global History of the Information Age

KAN: Kolmogorov-Arnold Networks

Learning through Transient Matching in Congested Markets

What Makes Software Work?

Algorithms and Hardness for Attention and Kernel Density Estimation

A Unified Analysis of Label Inference Attacks

Copyright Regenerated: Harnessing GenAI to Measure Originality and Copyright Scope

The Data Minimization Principle in Machine Learning

Challenges in Augmenting Large Language Models with Private Data

Greybeard Qualification (Linux Internals) part 1: Process Structure and IPC