Back to news
NewsMay 4, 2026· 2 min read

OpenAI o1 beats doctors at emergency diagnosis in Harvard study

AI achieved 67% accuracy vs 50-55% for physicians when diagnosing from identical patient records, but study covers just 76 cases at one hospital.

By Agentic DailyVerified Source: ETHealth

Our Take

Small sample size and single-hospital scope make this more proof-of-concept than clinical validation.

Why it matters

Emergency departments face diagnostic pressure with incomplete information, exactly where AI pattern recognition could reduce errors. The 17-point accuracy gap suggests real potential if the results hold across diverse populations.

Do this week

Healthcare CIOs: Audit your current diagnostic decision support tools this month so you can benchmark capabilities against emerging AI alternatives.

OpenAI o1 diagnosed 67% of emergency cases correctly

Harvard Medical School researchers compared physicians against OpenAI's o1 model using real emergency department cases. Both groups received identical electronic health records with vital signs, demographics, and clinical notes, but conducted no physical examinations.

The AI achieved correct or near-correct diagnoses in 67% of the 76 patient cases, compared to 50-55% for doctors (per the Harvard study). With additional patient information, AI accuracy rose to 82% while physicians reached 70-79%, though this difference wasn't statistically significant.

Treatment planning showed an even wider gap. The AI scored 89% on case study analysis versus roughly 34% for physicians using conventional resources (study data).

In one case, a patient with worsening lung symptoms was initially thought to be failing treatment. The AI identified an alternative explanation linked to the patient's lupus history, which proved correct.

Text-only diagnosis exposes AI's clinical reasoning limits

The AI's advantage appeared strongest in high-pressure, information-limited scenarios where cognitive biases typically affect human judgment. It processes large data volumes quickly and evaluates multiple diagnostic possibilities simultaneously.

But the system operated purely on text-based records, missing physical cues like patient appearance, behavior, or visible distress. This positions it as a second-opinion tool rather than a clinical replacement.

The study's scope raises questions about broader applicability. A single hospital with 76 cases leaves performance across diverse populations untested.

Integration questions outweigh accuracy gains

Researchers Arjun Manrai and Adam Rodman see AI supporting clinical decision-making, while Wei Xing cautioned against assuming readiness for routine use.

The core challenge isn't accuracy but accountability. No framework yet defines responsibility when AI-assisted decisions go wrong. Emergency departments need liability clarity before deployment, regardless of diagnostic performance.

The treatment planning gap (89% vs 34%) suggests AI could help most in complex cases where physicians struggle with resource constraints. But emergency medicine demands split-second decisions based on incomplete information, exactly where black-box AI explanations fall short of clinical requirements.

#Healthcare AI#LLM#Research#Enterprise AI
Share:
Keep reading

Related stories