TRAILS Researchers Part of Team Receiving $1.8M DARPA Award to Make AI More Trustworthy

Large language models (LLMs)—a form of AI that generates human-like text—are now being embedded in everything from everyday chatbots used for customer service to mission critical systems involved in high-stakes decision-making.

As these technologies grow more powerful and pervasive, ensuring that they behave consistently, adapt to new situations, and remain secure from attacks is more urgent than ever.

Supported by a $1.8 million award from the Defense Advanced Research Projects Agency (DARPA), faculty from the University of Maryland and New York University are currently working to address this challenge, developing fundamental research benchmarks that can help guarantee robustness and accuracy in LLMs.

The multi-institutional team includes Soheil Feizi and Furong Huang, both associate professors of computer science at UMD, and Andrew Wilson, a professor at New York University’s Courant Institute of Mathematical Sciences and Center for Data Science.

“The goal is to create a certifiable safety net that high-stakes applications can rely on,” says Feizi, who along with Huang is part of the Institute for Trustworthy AI in Law & Society (TRAILS).

Unreliable AI can have far-reaching consequences, the researchers say. In high-stakes environments, for example, incorrect outputs could lead to faulty legal or medical advice.

The research team plans to address three core challenges in the use of LLMs in high-stakes settings, beginning with consistency. They aim to ensure that the AI technology produces stable, reliable outputs—even when the prompts or inputs to the LLM platforms are phrased slightly differently.

To support this effort, the researchers will develop innovative mathematical tools, including use of a process called “smoothing” that highlights underlying trends in datasets by eliminating outliers and variations, says Wilson.

The researchers are also examining how LLM platforms perform on tasks or data they were not explicitly trained on. Generalization is a key measure of whether a model truly understands new situations or is merely guessing based on memorized patterns, Huang explains. But by exploring what drives this capability—such as model size, architecture, or training methods—the researchers hope to improve performance in dynamic environments.

“Generalization is not just about performing well on a test set—it’s about demonstrating a real understanding of novel inputs,” says Huang. “We want to move beyond surface-level pattern matching and probe whether LLMs can truly reason in unfamiliar situations.”

The third research thrust will focus on adversarial robustness, which involves protecting LLM technology from being tricked or manipulated. To counter threats like prompt injections, jailbreak attempts, and other adversarial inputs that can currently bypass safeguards—thereby causing harmful outputs or revealing sensitive data—the research team is modeling the AI’s decision-making process as a step-by-step conversation, applying reinforcement learning algorithms to help systems stay aligned with their intended goals.

Feizi says this project was inspired by real-world failures in AI reliability and security, and that the project’s results could one-day benefit multiple domains like healthcare, finance and education.

“AI is advancing quickly,” Feizi says. “But if we want to use it responsibly, especially in high-stakes settings, we must understand how to keep it stable, fair and secure.”

—Story by Melissa Brachfeld, UMIACS communications group

Next
Next

UMD Researchers Investigate Security Threats to Web AI Agents