The International Committee of the Red Cross (ICRC) recently issued a stark warning that has resonated through research communities worldwide: popular artificial intelligence models like OpenAI's ChatGPT, Google's Gemini, and Microsoft's Copilot are generating "incorrect or fabricated archival references," sending researchers on wild goose chases for journals and documents that don't exist. This phenomenon, often called "AI slop," represents more than just a technical glitch—it's creating real-world operational headaches for institutions that must prove negatives while wasting precious research time and resources.

The Scale of the Problem: From Archives to Courtrooms

According to the ICRC statement, AI models are inventing entirely fictional publications like the "Journal of International Relief" and the "International Humanitarian Digital Repository." These fabrications aren't just harmless errors—they're creating measurable burdens on research institutions. Sarah Falls, chief of researcher engagement at the Library of Virginia, reports that approximately 15 percent of emailed reference questions her library receives are now ChatGPT-generated, with many containing hallucinated citations for both published works and unique primary source documents.

This problem extends far beyond academic archives. The legal profession has already seen aggressive consequences when AI hallucinations migrate into official filings. Recent reporting documents dozens of cases where attorneys submitted briefs citing non-existent cases generated by chatbots; judges in multiple jurisdictions have publicly admonished counsel, fined firms, and warned that reliance on unverified AI research can amount to malpractice. These incidents demonstrate how fabricated citations carry financial sanctions and ethical consequences, not just academic inconvenience.

Why AI Makes Up References: The Technical Reality

Large language models are statistical sequence predictors optimized to continue text in a way that's likely given their training data. They're not search engines: unless explicitly connected to a reliable retrieval layer, they will fabricate plausible continuations when correct factual material is absent. Because their objective is to produce fluent responses rather than assert only when certain, models often generate false but plausible bibliographic entries and archival descriptors.

Recent computational studies confirm this structural limitation. Controlled evaluations across multiple chatbots found that fewer than 30 percent of generated academic references were entirely correct, with a large share partially correct or wholly fabricated. In one controlled study, only about a quarter of generated academic references were fully correct, while nearly 40 percent were erroneous or entirely fabricated. This empirical result underlines that hallucinated citations are a model-class property, not an isolated bug.

The Unique Burden on Archivists and Librarians

Archivists face a particularly challenging burden when dealing with AI-generated requests. A fabricated citation to a "unique primary source" requires an archivist to search finding aids, accession registers, and sometimes decades of uncatalogued material to show that an item doesn't exist. As Falls explains, "For our staff, it is much harder to prove that a unique record doesn't exist." This creates a resource drain where archivists and reference librarians spend scarce time chasing nonexistent items, diverting resources from genuine research assistance.

The problem is forcing institutions to change workflows and introduce new policies. The Library of Virginia now asks researchers to vet their sources before making requests and to disclose if a source originated from AI. "We'll likely also be letting our users know that we must limit how much time we spend verifying information," Falls says. This operational shift illustrates how user behavior is changing faster than institutional processes can adapt.

Academic Integrity Under Threat

Universities confront a twin threat from AI-generated citations. Students who rely on AI-generated references uncritically can submit papers riddled with fabricated citations, while faculty who use LLM assistance may inadvertently include bad references in literature reviews. The reproducibility and traceability of scholarship depend on verifiable references; fabricated citations can mislead peer reviewers and amplify false claims into the academic record.

This contamination of scholarship represents a serious long-term risk. Fabricated citations that slip into theses, articles, or briefs can misdirect follow-on work and create false research trails that waste other researchers' time. The problem scales quickly because models can invent many plausible but false items rapidly, potentially outpacing manual verification if institutions don't adapt their processes.

Technical Solutions and Their Limitations

Retrieval-Augmented Generation (RAG) offers a promising technical approach to reducing hallucinations. In RAG systems, the model consults a curated database or live index and conditions its output on retrieved documents. Properly engineered RAG pipelines can dramatically cut hallucination rates for citations by forcing the model to ground its answers in verifiable records.

However, not all consumer chatbots use robust RAG implementations, and even when they do, retrieval quality and index freshness vary. This means hallucinations remain a live risk even with advanced systems. Promising technical work—including retrieval-first architectures and specialized academic LLMs trained to emit accurate citations—shows the path forward, but widespread adoption depends on vendor commitment and regulatory or market pressure.

Practical Guidance for Researchers and Institutions

For researchers, students, and independent writers, the fundamental rule is simple: treat AI as a brainstorming tool, not an authority. Use chatbots to generate search terms, topical overviews, and draft language—but not to cite primary sources without verification. Every reference must be confirmed through library catalogues, publisher databases, CrossRef and DOI lookups, or primary archive finding aids.

Institutions are developing specific strategies to manage this challenge:

For librarians, archivists, and information professionals:
- Update intake workflows to require requesters to indicate if an AI tool produced a citation
- Set realistic service limits for staff time spent verifying unverifiable claims
- Provide training materials on AI hallucinations and verification techniques
- Deploy or pilot RAG systems for internal reference tools

For universities, publishers, and professional bodies:
- Adopt verification policies mandating human verification of AI-aided references
- Institute penalties for negligent reliance on unverified AI output
- Support development of automated citation-verification tools

The Path Forward: Verification Before Reliance

The rise of generative AI has delivered undeniable productivity gains, but the same fluency that makes these models useful also enables them to invent plausible-sounding fabrications that waste staff time, undermine scholarship, and expose professionals to legal and ethical risk. Addressing this problem requires coordinated action on multiple fronts: user education and verification discipline; institutional policy and workflow redesign; and vendor engineering to ground outputs and expose provenance.

Until these changes are broadly adopted, the safest rule for researchers and professionals remains immutable: never accept an AI citation at face value—verify it before you rely on it. As the ICRC recommends, people should consult online catalogs or references in existing published scholarly works to find references to real studies instead of assuming anything cited by an AI is real, no matter how authoritative it might sound.

The phenomenon of AI-generated fabricated citations represents a critical moment for research integrity. How institutions, developers, and users respond will determine whether AI becomes a reliable research assistant or a source of persistent misinformation in scholarly work. The current evidence suggests we're at a crossroads—one that requires both technological innovation and renewed commitment to verification practices that have long been the foundation of credible research.