Skip to main content

UB Researchers Develop AI System to Detect AI-Generated Radiology Reports

Researchers at the State University at Buffalo (UB) have developed an AI-based system designed to distinguish radiology reports written by clinicians from those generated by large language models — a capability intended to detect falsified medical documentation and prevent fraudulent insurance claims. The work was led by Nalini Ratha, SUNY Empire Innovation Professor in the Department of Computer Science and Engineering at UB, alongside PhD students Arjun Ramesh Kaushik and Tanvi Ranga. The team presented their findings at the GenAI4Health workshop at the Conference on Neural Information Processing Systems (NeurIPS) in December 2025.

UB researchers develop AI system to detect AI-generated radiology reports to prevent medical documentation fraud
BERT-Mamba framework developed at UB detects synthetic radiology reports with 92-100% accuracy

The Problem: Generative AI and Medical Documentation Fraud

The growing capability of large language models to produce convincing, domain-appropriate text has created a new category of risk in healthcare: the fabrication of radiology reports that appear authentic to human reviewers. Such synthetic reports could be used to falsify patient medical histories, fabricate evidence supporting fraudulent insurance claims, or manipulate electronic health records in legal or administrative contexts.

“With generative AI becoming more capable of producing remarkably convincing radiology reports, there’s a greater risk of fabricated reports being used to falsify medical histories and support fraudulent claims,” Ratha said. “Radiology reports have highly specialized structure, vocabulary and stylistic norms, making general-purpose detectors unreliable. Therefore, our goal was to build a detection framework designed specifically for radiology that can distinguish clinician-written medical documentation from synthetic text before it reaches clinical or insurance workflows.”

Methodology: Dataset and BERT-Mamba Architecture

To develop and validate their system, the researchers assembled a dataset containing 14,000 pairs of chest X-ray reports: one written by radiologists and one generated by AI. Synthetic reports were produced in two ways: by paraphrasing existing reports using large language models, and by generating reports directly from radiographs using vision-language models. The dataset focuses on the findings section of radiology reports, which typically contains detailed clinical observations and domain-specific terminology that most distinguishes human radiologist writing from AI output.

Using this dataset, the team built a detection framework based on a BERT-Mamba architecture specifically designed to separate stylistic patterns from clinical content. The core hypothesis is that language models frequently replicate medical terminology accurately but differ from clinicians in writing style — and these differences, while subtle, are computationally detectable with sufficient precision to be clinically actionable.

Results: 92 to 100% Accuracy

In testing, the system achieved Matthews Correlation Coefficient (MCC) scores ranging from 92% to 100% when distinguishing human-written from AI-generated radiology reports. Notably, the model also identified synthetic reports generated by AI systems it had not encountered during training — demonstrating meaningful generalization capability beyond the specific LLMs included in the training set.

“AI systems leave subtle stylistic fingerprints such as patterns in phrasing, punctuation, and word choice that differ from how radiologists naturally write. By disentangling style from content and treating it as its own measurable feature, our model was able to detect those patterns with exceptional precision,” said Kaushik.

Ranga added a key observation: “What we found is LLMs tend to write in polished, expansive language, while clinicians write in concise, direct terms.” This asymmetry between the elaborate language patterns of LLMs and the terse, protocol-driven style of professional radiology reports forms the core exploitable signal in the BERT-Mamba framework.

Implications for Healthcare Information Security

The significance of this research extends well beyond the academic context. In a healthcare ecosystem where digital radiology reports circulate between providers, insurance companies, regulators, and PACS systems, the ability to authenticate the human authorship of a medical document could become a key component of clinical information security infrastructure. As AI in medical imaging is already broadly discussed in the context of detection and diagnosis, its application to document authenticity opens an entirely new domain.

For health insurers and managed care organizations, the integration of radiology report authentication tools into claims validation workflows could significantly reduce fraud costs. In legal and regulatory contexts, the ability to determine whether a submitted report was written by a clinician or generated by an LLM could be material to dispute resolution and compliance auditing.

Broader Context: Generative AI in Radiology Practice

The UB research emerges at a moment when generative AI is being explored across multiple radiology contexts — from automated report drafting (with subsequent human review) to clinical decision support. Tools such as GPT-4V, Med-PaLM, and proprietary solutions from vendors including Nuance (Microsoft) already enable structured text generation from medical images or voice dictation. As these capabilities expand, the proliferation of AI-generated text in clinical workflows creates an urgent need for authentication mechanisms.

Just as AI-integrated PACS systems must ensure the integrity of the imaging chain of custody, report generation and transmission systems will need authorship verification tools. The UB research provides a concrete technical foundation for such systems, demonstrating that the stylistic fingerprints of AI-generated text are detectable even when medical terminology is reproduced accurately.

Next Steps

The researchers plan to expand their dataset to include additional radiology modalities beyond chest X-rays, and to test the framework against a broader range of AI models — including those that emerge after the publication date of the current work. Their long-term goal is to release the framework publicly, enabling health systems, insurers, and regulators to integrate radiology report authorship verification into their own workflows. If adopted at scale, this capability could meaningfully strengthen the integrity of medical documentation across the digital health ecosystem.

Source: DOTmed

Leave a Reply