Privacy Amplification for Correlated-Noise Mechanisms via b-Min-Sep Subsampling // TRAIN BRAIN

Privacy Amplification for Correlated-Noise Mechanisms via b-Min-Sep Subsampling

A Google TechTalk, 2026-02-18, presented by Andy Dong
ABSTRACT: DP-SGD remains the standard approach for private model training. A variety of techniques have been developed to improve its privacy–utility tradeoff, including privacy amplification through data subsampling, leveraging structured randomness in the training process, and correlated-noise mechanisms such as DP matrix factorization (DP-MF). While DP-MF can improve utility, its interaction with subsampling-based amplification is less explored than in the classical DP-SGD setting.
In this talk, I present b-min-sep subsampling, a simple batching scheme for DP-MF. The key idea is to impose only a minimal participation constraint that preserves the structural properties required by correlated-noise mechanisms, while retaining substantial flexibility in sampling. The resulting scheme improves over cyclic Poisson subsampling and is a generalization of balls-in-bins subsampling.
I will give a high-level overview of the privacy analysis based on Monte Carlo accounting and dynamic programming, and present empirical results demonstrating improved privacy–utility tradeoffs in both example-level and multi-attribution user-level settings. More broadly, this work fits into a line of research on leveraging randomness in training for privacy amplification.

Google TechTalks

Google Tech Talks is a grass-roots program at Google for sharing information of interest to the technical community. At its best, it's part of an ongoing discussion about our world featuring top experts in diverse fields. Presentations range from the br...

Machine Text Detectors are Membership Inference Attacks

Differentially Private Table-Image Multimodal Data Generation

Atomic Facts to Structured Knowledge: Rethinking Unlearning & Jailbreaking in Large Language Models

Understanding Trade-offs Between Worst-case Differential Privacy Guarantees & Real Threat Models

Local Node Differential Privacy

Privacy Amplification for Correlated-Noise Mechanisms via b-Min-Sep Subsampling

Streaming Attention Approximation via Discrepancy Theory

Is Learning Effective in Dynamic Strategic Interactions? Evidence from Stackelberg Games

Algorithmic Contract Design

Online Learning and Economics

Go Meetup April 2025 - i18n Go Experiment

Go Meetup April 2025 - Whats New in Go 1.24?

Go Meetup April 2025 - Git Bisect and Go Size Analyzer

Go Meetup April 2025 - Photobooth

Go Meetup April 2025 - Go Protobuf

Understanding LLMs Like Physicists: Observation, Hypothesis, Experimentation, and Prediction

Theoretical Limitations of Multi layer Transformers

AI Snake Oil

How I Wrote 10K Lines of Go in a Weekend

Supply Chain Security with Go

A Multi Dimensional Online Contention Resolution Scheme

Robust Distortion-free Watermarks for Language Models

Is it possible to make self-adjusting data structures concurrent?

Privacy Preserving ML with Fully Homomorphic Encryption

The Chinese Computer: A Global History of the Information Age

KAN: Kolmogorov-Arnold Networks

Learning through Transient Matching in Congested Markets

What Makes Software Work?

Algorithms and Hardness for Attention and Kernel Density Estimation

A Unified Analysis of Label Inference Attacks

Copyright Regenerated: Harnessing GenAI to Measure Originality and Copyright Scope

The Data Minimization Principle in Machine Learning

Challenges in Augmenting Large Language Models with Private Data

Greybeard Qualification (Linux Internals) part 1: Process Structure and IPC