AI Surpasses Doctors in Harvard Emergency Triage Diagnosis Trial

A Harvard study reveals AI outperforms doctors in emergency triage diagnoses and treatment planning, signaling a major shift in medical technology but not replacing physicians yet.

AI Outperforms Doctors in Emergency Triage Diagnoses, Harvard Study Finds

Researchers describe the findings as a “profound change in technology that will reshape medicine.”

From iconic TV portrayals like George Clooney in ER to Noah Wyle in The Pitt, emergency department doctors have long been celebrated as heroes. However, recent research suggests a potential shift in this dynamic.

A pioneering study conducted at Harvard has revealed that artificial intelligence (AI) systems outperformed human doctors in high-pressure emergency medicine triage, providing more accurate diagnoses during critical moments when patients are first admitted to hospital.

The results, characterized by independent experts as representing “a genuine step forward” in AI clinical reasoning, emerged from trials comparing the diagnostic responses of hundreds of doctors against an AI system.

The study’s authors, publishing their findings in the journal Science, stated that large language models (LLMs) “have eclipsed most benchmarks of clinical reasoning.”

Study Details and Diagnostic Accuracy

One key experiment involved 76 patients who arrived at the emergency room of a Boston hospital. Both an AI and two human doctors were provided with the same standard electronic health record for each patient. These records typically included vital signs, demographic data, and brief notes from nurses explaining the patient’s reason for visit.

The AI identified the exact or very close diagnosis in 67% of cases, outperforming the human doctors, who achieved correct diagnoses between 50% and 55% of the time.

The AI’s advantage was especially notable in triage situations requiring rapid decisions based on minimal information. When more detailed data was available, the AI—OpenAI’s o1 reasoning model—achieved an 82% accuracy rate, compared to the 70% to 79% accuracy range of expert human doctors, although this difference was not statistically significant.

Ad (970x250)

Ad (425x293)

Additionally, the AI outperformed a larger group of human doctors in devising longer-term treatment plans, such as antibiotic regimens or end-of-life care strategies. In this test, the AI and 46 doctors reviewed five clinical case studies. The AI scored 89% in treatment planning, significantly higher than the 34% achieved by the doctors using conventional resources like search engines.

Limitations and Future Role of AI in Medicine

Despite these promising results, the researchers emphasized that this does not signal the end of emergency doctors’ roles. The study only assessed AI and human performance based on patient data communicated via text. The AI was not tested on interpreting physical signals such as patient distress levels or visual appearance, meaning it functioned more as a clinician providing a second opinion based on documentation.

“I don’t think our findings mean that AI replaces doctors,” said Arjun Manrai, one of the lead authors and head of an AI lab at Harvard Medical School. “I think it does mean that we’re witnessing a really profound change in technology that will reshape medicine.”

Dr. Adam Rodman, another lead author and physician at Boston’s Beth Israel Deaconess Medical Center where the study was conducted, described AI LLMs as among “the most impactful technologies in decades.” He predicted that over the next ten years, AI would not replace physicians but would become part of a new “triadic care model … the doctor, the patient, and an artificial intelligence system.”

Case Example and Current AI Adoption

In one notable case from the Harvard study, a patient presented with a pulmonary blood clot and worsening symptoms. Human doctors suspected that anticoagulants were ineffective, but the AI identified a critical detail overlooked by humans: the patient’s history of lupus could be causing lung inflammation. This AI diagnosis was later confirmed as correct.

Currently, nearly one in five U.S. physicians use AI to assist diagnosis, according to a report published last month. In the UK, 16% of doctors use AI daily and an additional 15% use it weekly, with “clinical decision-making” among the most common applications.

UK doctors’ primary concerns regarding AI include potential errors and liability risks. Despite billions of dollars invested in AI healthcare companies, questions remain about the implications of AI mistakes.

“There is not a formal framework right now for accountability,” said Rodman, who also emphasized that patients ultimately “want humans to guide them through life or death decisions [and] to guide them through challenging treatment decisions.”

Expert Commentary and Cautions

Professor Ewen Harrison, co-director of the University of Edinburgh’s Centre for Medical Informatics, noted the study’s significance, stating that “these systems are no longer just passing medical exams or solving artificial test cases. They are starting to look like useful second-opinion tools for clinicians, particularly when it is important to consider a wider range of possible diagnoses and avoid missing something important.”

Dr. Wei Xing, assistant professor at the University of Sheffield’s School of Mathematical and Physical Sciences, raised concerns that doctors might unconsciously defer to AI answers rather than independently analyzing cases.

“This tendency could grow more significant as AI becomes more routinely used in clinical settings,” he said. He also highlighted the lack of data on which patient groups the AI struggled with, such as elderly patients or non-English speakers.

Dr. Xing cautioned that the study “does not demonstrate that AI is safe for routine clinical use, nor that the public should turn to freely available AI tools as a substitute for medical advice.”