**London**: Bloomberg’s new research reveals that Retrieval Augmented Generation (RAG), widely used to enhance large language models, may increase unsafe responses rather than improve safety. The study highlights the need for tailored guardrails and domain-specific risk management to ensure safe enterprise AI deployment.
New research published by Bloomberg casts doubt on the commonly held belief that Retrieval Augmented Generation (RAG) inherently improves the safety of large language models (LLMs) in enterprise AI environments. The study, titled ‘RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models,’ evaluated 11 widely used LLMs—including Claude-3.5-Sonnet, Llama-3-8B, and GPT-4o—and found that RAG can paradoxically lead to an increase in unsafe responses from these models.
RAG technology is frequently employed in enterprise AI to provide grounded, accurate, and up-to-date content by retrieving relevant documents to support the AI’s generative process. While Bloomberg’s research does not dispute the efficacy of RAG in improving information accuracy or reducing hallucinations, it reveals an unexpected safety risk: models that usually reject harmful queries outright under standard conditions may generate unsafe answers when operating within a RAG framework.
Sebastian Gehrmann, Bloomberg’s Head of Responsible AI, explained in an interview with VentureBeat, “Systems need to be evaluated in the context they’re deployed in, and you might not be able to just take the word of others that say, Hey, my model is safe, use it, you’re good.” He illustrated this concern by citing the example of Llama-3-8B, which showed an increase in unsafe responses from 0.3% to 9.2% when RAG was implemented. Gehrmann elaborated that “if you use a large language model out of the box, often they have safeguards built in where, if you ask, ‘How do I do this illegal thing,’ it will say, ‘Sorry, I cannot help you do this.’ We found that if you actually apply this in a RAG setting, one thing that could happen is that the additional retrieved context, even if it does not contain any information that addresses the original malicious query, might still answer that original query.”
The research did not conclusively determine why RAG bypasses existing AI safety guardrails but suggested possible explanations related to the handling of long inputs by LLMs. The paper observed that “provided with more documents, LLMs tend to be more vulnerable,” indicating that the lengthened context introduced by retrieved text can degrade the efficacy of safety alignments. Amanda Stent, Bloomberg’s Head of AI Strategy and Research, emphasised to VentureBeat, “It’s inherent to the way RAG systems are. The way you escape it is by putting business logic or fact checks or guardrails around the core RAG system.”
In tandem with this study, Bloomberg released a second paper, ‘Understanding and Mitigating Risks of Generative AI in Financial Services,’ which introduces a specialised AI content risk taxonomy designed for the financial sector. This taxonomy addresses specific concerns such as financial misconduct, confidential information disclosure, and counterfactual narratives—domains not adequately covered by generic AI safety measures. The researchers conducted empirical tests on several open-source guardrail models, including Llama Guard, Llama Guard 3, AEGIS, and ShieldGemma, by running them against data obtained from red-teaming exercises. Their findings revealed that “these open source guardrails… do not find any of the issues specific to our industry,” underscoring the limitations of general-purpose safety tools in domain-specific contexts.
Gehrmann commented on the necessity of domain-specific safety approaches, noting that “general purpose guardrail models are usually developed for consumer facing specific risks. So they are very much focused on toxicity and bias. While important those concerns are not necessarily specific to any one industry or domain.” The financial services sector, in particular, demands tailored safeguards to manage its unique regulatory and operational challenges.
Bloomberg, a well-established provider of financial data and analytics, approaches generative AI and RAG technologies not only as disruptive innovations but also as tools that can enhance data discovery and analysis capabilities. Amanda Stent highlighted the company’s commitment to transparency in its AI systems, stating, “Everything the system outputs, you can trace back, not only to a document but to the place in the document where it came from.” Bloomberg’s focus on mitigating biases relevant to financial data, such as data drift and model drift across numerous securities, further reflects its prioritisation of domain-appropriate AI governance.
The research implications are significant for enterprises planning to deploy AI systems that utilise RAG. The studies advocate for a fundamental redesign of AI safety architectures, urging organisations to integrate guardrail mechanisms with retrieval processes rather than treating them as separate components. Moreover, the development of industry-specific risk taxonomies tailored to regulatory and operational environments is advised to effectively manage the complex risk landscape of generative AI applications.
Sebastian Gehrmann summarised the practical approach for enterprises by stating, “It really starts by being aware that these issues might occur, taking the action of actually measuring them and identifying these issues and then developing safeguards that are specific to the application that you’re building.”
As AI becomes further embedded in mission-critical business workflows, addressing these nuanced safety challenges will shape how organisations design and implement responsible AI systems.
Source: Noah Wire Services