Technology & AI

In a Harvard study, AI provided more accurate diagnoses than emergency room doctors

New research examines how large-scale language models work in a variety of medical situations, including real-life emergency room situations — where at least one model appears to be more accurate than human doctors.

The study was published this week in Science and comes from a team of researchers led by doctors and computer scientists at Harvard Medical School and Beth Israel Deaconess Medical Center. The researchers said they conducted various tests to measure how OpenAI models compare to human doctors.

In another experiment, researchers focused on 76 patients who arrived at the Beth Israel emergency room, comparing the diagnoses given by two attending physicians to those generated by OpenAI’s O1 and 4o models. These diagnoses were checked by two other doctors present, who did not know which ones came from humans and which came from AI.

“In each diagnostic reminder, o1 may have performed better than or equal to the two doctors present and 4o,” said the study, adding that the difference “is expressed mainly in the first diagnostic point (first ER triage), where there is less information available about the patient and more urgency to make the right decision.

In the press conference of the Harvard Medical School about the study, the researchers emphasized that “they did not pre-process the data at all” – the AI ​​models were given the same information that was found in the electronic medical records at the time of each diagnosis.

With that information, the o1 model was able to provide a “correct or very close diagnosis” in 67% of cases, compared to one doctor who had a correct or close diagnosis 55% of the time, and another who hit the mark 50% of the time.

“We tested the AI ​​model on almost every benchmark, and it outperformed previous models and our physician benchmarks,” said Arjun Manrai, head of the AI ​​lab at Harvard Medical School and one of the study’s lead authors, in a press release.

Techcrunch event

San Francisco, CA
|
October 13-15, 2026

To be clear, the study did not say that AI is ready to make real life-or-death decisions in the emergency room. Instead, it said the findings show “an urgent need for prospective trials to test this technology in real-world patient care settings.”

The researchers also noted that they only studied how well the models worked when given text-based information, and that “existing research suggests that current basic models are limited in their ability to infer more than non-textual input.”

Adam Rodman, a Beth Israel physician who is also one of the study’s lead authors, told the Guardian that “there is currently no formal accountability framework” for AI diagnoses, and that patients “still want people to guide them in life-or-death decisions.” [and] guiding them through challenging treatment decisions”.

If you shop through links in our articles, we may earn a small commission. This does not affect our editorial independence.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button