GPT-Rosalind: OpenAI's New Model Outperforms Human Experts
OpenAI's GPT-Rosalind is a specialized life sciences reasoning model targeting drug discovery, genomics, and protein engineering, with a free Codex plugin for tool integration.
On April 16, 2026, OpenAI launched GPT-Rosalind, a frontier reasoning model built for the biological and chemical sciences. It targets drug discovery, genomics analysis, and protein engineering. Named after Rosalind Franklin, whose research helped reveal the structure of DNA, the model is the first release in what OpenAI calls the GPT-Rosalind life sciences model series.
Scientific Workflow Support
GPT-Rosalind is optimized for multi-step scientific workflows rather than general conversation. It is designed for evidence synthesis, hypothesis generation, experimental planning, and data analysis across chemistry, protein structure, and genomics. The evaluations measure reasoning over molecules, proteins, genes, pathways, and disease-relevant biology, as well as the ability to select and use the right computational tools and databases.
To connect the model to existing research environments, OpenAI released a Life Sciences research plugin for Codex, freely available on GitHub. The plugin provides modular skills covering human genetics, functional genomics, protein structure, biochemistry, clinical evidence, and public study discovery. It gives researchers access to more than 50 public multi-omics databases, literature sources, and biology tools within a single orchestration layer. If you build tools for AI agents, this plugin demonstrates how to structure domain-specific tool access for multi-step reasoning workflows.
Benchmark Results
GPT-Rosalind outperforms general-purpose GPT-5.4 across multiple scientific evaluations.
| Benchmark | Performance Metric | Notable Details |
|---|---|---|
| BixBench | Leading published score | Real-world bioinformatics and data analysis. |
| LABBench2 | Beats GPT-5.4 on 6 of 11 tasks | Strongest gain in CloningQA (molecular cloning protocol design). |
| RNA Sequence-to-Function | >95th percentile of human experts | Best-of-ten submissions, evaluated with Dyno Therapeutics. |
| RNA Sequence Generation | ~84th percentile of human experts | Evaluated with Dyno Therapeutics using uncontaminated sequences. |
The RNA evaluations used unpublished, uncontaminated sequences provided by Dyno Therapeutics, compared against 57 historical scores from human experts in the AI-bio field. Evaluating AI output against domain expert baselines on novel data provides a stronger signal than standard benchmarks using public datasets.
Trusted Access and Deployment
GPT-Rosalind is available as a research preview through ChatGPT, Codex, and the API for qualified customers. OpenAI restricts access to U.S. enterprise customers conducting legitimate scientific research with clear public benefit. Organizations must maintain governance, compliance, and misuse-prevention controls, restrict access to approved users within secure environments, and agree to the life sciences research preview terms.
Early customers include Amgen, Moderna, the Allen Institute, and Thermo Fisher Scientific. OpenAI is also exploring AI-guided protein and catalyst design with Los Alamos National Laboratory.
During the research preview, use of the model does not consume existing credits or tokens, subject to abuse guardrails. OpenAI plans to share pricing details as the program expands.
If you develop software for the life sciences, evaluate the Codex orchestration plugin against your current internal toolchains. The modular skills cover common repeatable workflows like protein structure lookup, sequence search, literature review, and public dataset discovery without requiring custom database connectors.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
How to Deploy Mistral Small 4 for Multimodal Reasoning and Coding
Learn how to deploy Mistral Small 4 with reasoning controls, multimodal input, and optimized serving on API, Hugging Face, or NVIDIA.
Boosting Drug Discovery via Paired Protein Language Model
Researchers at NUS unveil PPLM, a novel AI architecture that models protein-protein interactions with 17% higher accuracy than previous methods.
AI Prompt Injection Masks Malware in 19 PyPI Science Packages
The Hades supply chain campaign compromised 19 bioinformatics and Graph ML libraries on PyPI with memory scrapers and AI scanner misdirection.
Tunix Hackathon Yields 1B-Parameter Gemma Reasoning Models
Google released the results of its Tunix hackathon, showcasing how developers trained small Gemma models to use reasoning traces on a strict compute budget.
Pre-Trial AI Toxicity Filters Isolate IRS4 Cancer Target
Researchers at St. Jude used AI safety filtering to identify IRS4 as a high-potential target for solid tumors by predicting toxicity before clinical trials.