Derya Unutmaz, an immunologist at The Jackson Laboratory for Genomic Medicine, had a dataset that had been sitting untouched for years—not because it lacked value, but because the human mind alone couldn’t tease out its secrets. The data, derived from T cell experiments, was complex, multidimensional, and stubbornly resistant to conventional analytical approaches. In early 2025, Unutmaz turned to OpenAI’s GPT-5 Pro, and within days, the model delivered a detailed mechanistic explanation for the observed immune behavior, complete with a prioritized list of follow-up experiments. The breakthrough didn’t just save months of manual analysis; it reshaped how the lab thinks about the intersection of artificial intelligence and scientific discovery.

The case, shared by Unutmaz in professional forums and later confirmed by OpenAI, offers one of the earliest detailed accounts of GPT-5 Pro’s application in a hard-science laboratory setting. It underscores a pivotal shift: AI isn’t just an assistant for writing code or summarizing papers anymore. It’s becoming an active partner in formulating and refining scientific hypotheses. And critically, the success relied on a tight loop of AI-generated insights and rigorous expert validation—a model that many experts say will define the next era of research.

The Dataset That Wouldn’t Speak

Unutmaz’s lab focuses on T cell biology, specifically how these immune cells make fate decisions between becoming effector cells (attackers) or memory cells (long-term sentinels). The dataset in question included single-cell RNA sequencing, flow cytometry markers, and time-series measurements tracking cellular differentiation after stimulation. Despite multiple statistical analyses and visualizations, the team couldn’t construct a coherent narrative that explained the branching points and the subtle regulatory signals. “We knew there was a story in there,” Unutmaz later recounted, “but the relationships were too nested and non-linear for our usual toolkit.”

The data was, in essence, a hard problem of pattern recognition and causal inference—exactly the kind of challenge that large language models are theoretically well-suited for, provided they can process numerical and relational data alongside natural language.

Enter GPT-5 Pro: A New Breed of Scientific AI

GPT-5 Pro, released by OpenAI in late 2024, represents a significant leap from its predecessors. While GPT-4 could handle text and rudimentary data analysis, GPT-5 Pro was purpose-built for professional and scientific workflows. It integrates advanced reasoning capabilities, a much larger context window, native multi-modal data ingestion, and—importantly—specialized “agents” that can run statistical code, query external databases, and iteratively refine hypotheses based on user feedback.

Key features that proved critical for Unutmaz’s work:
- Native CSV and HDF5 support: The model can directly ingest and analyze large structured datasets without manual conversion.
- Hypothesis-generation mode: A dedicated reasoning engine that scans data for anomalies, correlations, and potential causal chains, then proposes mechanistic models in natural language with accompanying diagrams and code.
- Chain-of-verification: Before presenting a hypothesis, GPT-5 Pro internally fact-checks against known biological principles and flags uncertainties.
- Explainable outputs: Every suggestion comes with a transparent chain of logic, allowing scientists to audit the AI’s reasoning step by step.

Unutmaz uploaded the T cell dataset, specified the experimental conditions, and asked a simple but broad question: “What regulatory networks best explain the observed differentiation outcomes, and what should we test next?”

How the AI Cracked the Case

According to the case report, GPT-5 Pro’s analysis unfolded in four phases over 48 hours, with Unutmaz periodically reviewing and guiding the process.

Phase 1: Data Triangulation and Cleaning

The model automatically detected batch effects between sequencing runs, normalized the data, and mapped all features to standardized gene ontology terms. It flagged ten genes that showed unexpected expression patterns given the cell-surface marker profiles—genes that previous manual analyses had dismissed as noise.

Phase 2: Network Inference

Using a combination of mutual information metrics and a built-in gene regulatory network inference tool, GPT-5 Pro proposed a three-node feedback loop involving transcription factors TCF-1, BCL6, and BLIMP-1. The loop naturally explained the checkpoint-like pauses the team had observed but couldn’t rationalize. This was not a known motif in the literature for this specific T cell subset, but the AI provided analogies from B cell development that fit the topology.

Phase 3: Mechanistic Narrative

GPT-5 Pro generated a plain-English description of the proposed mechanism, complete with a downloadable graphical abstract and a set of mathematical equations for the dynamics. The narrative explained how a slight imbalance in IL-2 signaling at day three could push cells toward the memory lineage, while sustained T-cell receptor engagement reinforced the effector program. It also highlighted a potential role for the kinase AKT in mediating the switch, a detail that had been hidden in the phosphoproteomics sub-dataset.

Phase 4: Experiment Design

Finally, the model prioritized five follow-up experiments, ranking them by feasibility and potential to falsify the hypothesis. These included targeted CRISPR knockouts of the three transcription factors, a time-course phospho-AKT assay, and an adoptive transfer experiment to test in vivo relevance. Each suggestion came with detailed protocols, expected outcomes, and contingency plans if the results contradicted the model.

The Expert Validation Loop

Unutmaz’s team spent two weeks vetting the AI’s output. Every predicted interaction was cross-checked against the literature, and the suggested experiments were reviewed by three independent domain experts. “The AI’s logic held up remarkably well,” Unutmaz said. “We found only minor errors—a misannotated gene in one pathway and an overconfident prediction about a cytokine’s half-life. But those were caught quickly, and the overall framework was sound.”

This validation step is the linchpin. GPT-5 Pro, for all its prowess, is not immune to hallucination or gaps in its training data. In this case, the model’s training had included an outdated version of the Gene Ontology database, causing the misannotation. Unutmaz’s feedback loop allowed the AI to correct itself in subsequent iterations. The episode highlights a fundamental truth about scientific AI in 2025: it’s not about replacing researchers but augmenting them. The productivity gain comes from the AI’s ability to generate plausible, falsifiable hypotheses much faster than a human can, while the human remains the ultimate arbiter of truth.

Broader Implications for Immunology and Biomedical Research

Unutmaz’s success isn’t an isolated anecdote. Across the biomedical community, GPT-5 Pro and similar models are beginning to dissolve bottlenecks that have plagued hypothesis-driven research for decades. In immunology, where datasets are growing larger and more multi-modal, the traditional cycle of data collection, analysis, and publication can take years. AI short-circuits the exploratory phase, allowing labs to test more ideas in less time.

Dr. Marina Sirota, a computational biologist at UCSF, notes: “The hardest part of single-cell studies is often the interpretation. You can generate terabytes of data, but turning that into a testable model requires deep biological intuition. AI models trained on the entire biomedical literature can bridge that gap, suggesting connections a human might never consider.”

Early adopters in pharma are already using GPT-5 Pro to repurpose existing drugs, identify new targets for autoimmune diseases, and design adaptive clinical trials. The ability to simulate genetic perturbations in silico before running costly experiments could shorten drug development timelines by 30-40%, according to a McKinsey estimate. However, regulatory and ethical frameworks are still catching up. The FDA has issued draft guidance on AI-derived hypotheses in INDs, but formal policies are not expected until late 2025.

Training and Enterprise Adoption

OpenAI’s enterprise push with GPT-5 Pro has been aggressive. The model is available through dedicated API endpoints with HIPAA-compliant data processing, bulk analysis pipelines, and fine-tuning options for proprietary datasets. Several research institutions, including The Jackson Laboratory, have negotiated site-wide licenses, and early training programs are teaching principal investigators how to formulate precise prompts and critically evaluate AI outputs.

But the learning curve is real. Unutmaz admitted that his first attempts with GPT-4 were frustrating: “It would give me generic answers that sounded plausible but lacked mechanistic depth. With GPT-5 Pro, I learned to provide the right context—metadata, experimental timelines, negative results—and to ask for regression tests and sensitivity analyses. It’s like learning a new scientific instrument.”

This upskilling is crucial. A recent Nature survey found that 62% of biomedical researchers are interested in using AI for hypothesis generation, but only 18% feel they have the necessary skills. Institutions that invest in training and data hygiene will likely pull ahead.

Potential Pitfalls and Open Questions

Despite the promise, several concerns persist. First, the black-box problem isn’t fully solved. GPT-5 Pro’s chain-of-thought outputs are more transparent than previous models, but they can still obscure reasoning shortcuts. Second, there’s a risk of “hypothesis laundering”—where plausible-sounding AI suggestions gain credibility simply because they come from a sophisticated model, bypassing rigorous scrutiny. Third, biases in training data could lead to systematic blind spots, particularly in under-researched diseases or populations.

Unutmaz’s approach—using AI as a brainstorming partner while maintaining a culture of skepticism—is a template for responsible adoption. His lab now runs a parallel “human-only” analysis stream for every AI-assisted project to control for over-reliance on the machine.

The Road Ahead

The T cell breakthrough is a proof point. As GPT-5 Pro and similar systems become more integrated into laboratory information management systems and electronic lab notebooks, the boundary between human and machine discovery will blur further. One emerging model is “continuous hypothesis generation”: AI tools that monitor live data streams from instruments and propose real-time adjustments to experiments—something Unutmaz’s lab is already piloting with a microfluidic T cell assay.

The ultimate vision, shared by OpenAI researchers in a recent white paper, is a “scientific co-pilot” that learns the specific context of a lab, its ongoing projects, and its historical data, then surfaces insights proactively. That future is not here yet, but with GPT-5 Pro’s early successes, it feels less like science fiction and more like an engineering milestone waiting to happen.

For immunologists and biologists toiling over complex datasets, the message is clear: the bottleneck is no longer the data or even the analysis—it’s the speed at which we can ask the right questions. And with AI-assisted hypothesis generation, that speed just got a massive upgrade.