UMD Team Develops Precise “Undo Button” for AI Memory

An illustration depicts a robotic arm removing problematic information from an artificial intelligence model, representing a new “unlearning” technique developed by University of Maryland researchers that can selectively erase sensitive or harmful data without damaging the model’s overall capabilities. (Illustration by Keivan Rezaei)

Imagine trying to remove a single drop of red dye from a gallon of purple paint without ruining the color entirely. For developers of large language models (LLMs), that has long been the challenge of “unlearning”—the process of removing specific information from an AI system after it has already been trained.

Once sensitive personal information, copyrighted text or harmful misinformation becomes embedded in an AI model, it spreads across billions of internal connections. Until now, the most reliable solution was often the most extreme: discard the model and retrain it from scratch, a process that can cost millions of dollars and consume enormous amounts of energy.

Researchers at the University of Maryland and the Max Planck Institute for Software Systems have developed a new alternative—a method that functions like a surgical rollback tool for AI memory.

Led by Soheil Feizi, an associate professor of computer science with an appointment in the University of Maryland Institute for Advanced Computer Studies (UMIACS), the team created a framework that can selectively erase unwanted information from an AI model while preserving its overall intelligence and reasoning skills.

Their study, “Revisiting the Past: Data Unlearning with Model State History,” introduces a technique called Model State Arithmetic (MSA). The approach uses saved training snapshots—known as checkpoints—to identify exactly how problematic data altered a model during training and then reverse those changes with remarkable precision.

“We realized that the model’s own development history contains all the information we need to fix it,” said Feizi, who is also affiliated with the University of Maryland Center for Machine Learning and the Institute for Trustworthy AI in Law & Society (TRAILS). “By using these training snapshots as a guide, we can identify exactly how specific data points altered the AI’s behavior and simply reverse those changes. It’s a much more elegant solution than trying to force a finished model to forget through trial and error.”

Most existing unlearning methods attempt to make a completed AI model forget information after the fact, often degrading the system’s broader capabilities in the process. Researchers sometimes describe the side effects as “brain damage,” where models lose general knowledge, reasoning ability or even coherent language generation.

MSA takes a different approach by revisiting earlier stages of the model’s training process. During training, developers routinely save checkpoints as safeguards against crashes or hardware failures. Feizi and his collaborators discovered those same snapshots could be repurposed as a historical record of how information became encoded inside the model.

The system identifies a “clean” checkpoint from before the model encountered the data targeted for removal. Researchers then briefly retrain that earlier version on the unwanted material to measure how the model changed. The resulting mathematical signature—what the team calls a “forget vector”—captures the influence of the problematic data. Subtracting that vector from the final model effectively rolls back the unwanted information while leaving the rest of the system intact.

The researchers tested MSA using industry benchmarks designed to evaluate machine unlearning, including the Task of Fictitious Unlearning (TOFU) and the Machine Unlearning Six-Way Evaluation (MUSE). In experiments involving fictional biographies and copyrighted books, the method consistently removed targeted information while preserving the model’s reasoning and conversational performance.

Remarkably, the team found that the technique remained effective even when using checkpoints saved very early in training, suggesting that even infrequent backups may provide enough information to support precise unlearning.

As governments and technology companies face mounting pressure to comply with privacy laws, copyright protections and emerging “right to be forgotten” standards, the work could offer a practical path forward for AI developers seeking to update or correct models without rebuilding them entirely.

“AI models shouldn’t be permanent black boxes that can never be corrected once they’re trained,” said Keivan Rezaei, a fourth-year doctoral student in computer science at UMD and the paper’s lead author. “Our method shows that it’s possible to precisely remove harmful or sensitive information while preserving the intelligence the model gained from everything else. That’s a critical step toward building AI systems people can actually trust.”

The study was presented at the 2026 International Conference on Learning Representations (ICLR), held last month in Rio de Janeiro. In addition to Feizi and Rezaei, the research team included Mehrdad Saberi, a doctoral student in computer science at UMD, and Abhilasha Ravichander of the Max Planck Institute for Software Systems. Together, their work advances a growing effort at UMD to develop AI technologies that are not only more powerful, but also more transparent, adaptable and accountable.

—Story by Melissa Brachfeld, UMIACS communications group

Next
Next

The Role of Language in AI Research