The authors of two recent studies of the performance of the Epic Sepsis Model (ESM), a commonly used early warning tool for sepsis, reached different conclusions: One research group found ESM fails to predict many sepsis cases beyond what clinicians detect on their own, but another research group reported the tool enhances care.

Such results are confusing and no doubt leave some clinicians wondering whether they should continue to expend the resources needed to integrate the tool into their workflows. However, a closer look at the designs and findings of the investigations sheds light on why the conclusions differed and what information clinicians can glean from this work, particularly with respect to improving their own approaches toward the early detection and management of sepsis.

Investigators from Case Western Reserve University and MetroHealth in Cleveland followed 598 patients who presented to the ED over a period of five months in 2019. Patients were randomized to either standard care for sepsis or to an intervention group in which the early warning flag for sepsis in the electronic medical record (EMR) was accompanied by a notification to a pharmacist.1

These researchers found patients in the intervention group received their antibiotics significantly faster than patients in the standard care group. These patients also lived longer and did so out of the hospital than patients in the standard care group.

The care team regards the ESM as a tool, just like any other clinical decision support resource one might use. “We wanted to understand how it worked in our practice. The most robust way to do that was through a randomized, controlled [study],” explains Yasir Tarabichi, MD, MSCR, lead author and director of clinical informatics for research support in the MetroHealth System. “We just needed to know that it was a worthwhile thing to add to our workflow.”

While the results show that the tool was worthwhile when used as part of MetroHealth’s sepsis response (and the tool has since been fully implemented in the ED there), Tarabichi emphasizes it is not the tool that saved lives in the study. “Our care team members did that. Because we ran the study the way that we did, we know that they were more successful with the sepsis model implemented and baked into their response pattern,” he says.

Clinicians can use an early warning system to improve sepsis-related outcomes, but Tarabichi notes how one uses the tool likely contributes to its success. “Our study shows that [improved outcomes] are possible. Given the robust study design, it also potentially provides a roadmap that other systems can use when they are configuring, implementing, and evaluating systems like this in their own practices.”

The conclusions by Tarabichi and colleagues appear to run counter to the results of a different study completed by researchers at the University of Michigan.2 In that case, investigators conducted a validation study to determine how well the ESM tool performed as an early warning system in identifying sepsis in hospitalized patients. Researchers analyzed a retrospective cohort of 27,697 patients with 38,455 hospitalizations between Dec. 6, 2018, and Oct. 20, 2019. The authors concluded the ESM did not perform well.

Andrew Wong, MD, lead author and clinician in the department of internal medicine at Michigan Medicine, says the model was triggering too often and not catching many more additional cases than the health system’s clinicians. “As our study showed, the ESM raised alerts on 18% of all hospitalized patients, which is a significant burden for providers. At the same time, it only identified sepsis in 7% of patients with sepsis who were missed by a clinician, based on timely administration of antibiotics,” he reports. “That means that the model was firing significantly often but only had minimal benefit above usual clinical practice.”

Wong adds alert fatigue is important to consider when implementing an early warning tool because it can interfere with providers’ ability to deliver care, it dilutes the importance of other alerts, and it contributes to burnout.

To determine why the two studies delivered different results, consider the nature of the studies and some specific definitions investigators used, explains Karandeep Singh, MD, MMSc, who worked with Wong on the study.

“When we think about evaluating prediction models or severity models that are used as part of early warning systems, we usually do two things,” notes Singh, associate professor of learning health sciences and internal medicine at Michigan Medicine. “The first thing we do is evaluate how well the model predicts the outcome that is clinically relevant to us. Then, if we want to know how well the model works in practice, we link that model to an intervention and see how well the model performs ... Our study really just focused on the first [part].”

On the other hand, Singh notes the primary focus of the Case Western/MetroHealth researchers was the second part of the equation, although they did report some validation data on how well the model performs when running quietly, without being shown to anyone. When looking at these data from this smaller part of the Case Western study, Singh notes there are other significant differences between the two investigations. “In our study, we showed that the model misses two-thirds of patients with sepsis, whereas [Case Western researchers] showed that it only misses 10% of patients,” Singh observes.

Singh suspects one reason for this difference is Tarabichi and colleagues considered all sepsis predictions from the model, even those that were made after a patient developed sepsis. This can occur because when a clinician recognizes that someone has sepsis and then prescribes antibiotics, the model reacts to that by raising the score.

“I think they were capturing a fair number of people that had already been recognized as having sepsis ... that is a key difference [from our study],” Singh says.

Consequently, the number of sepsis cases the ESM identified in the study by Tarabichi and colleagues was in line with the tool developer’s claims. Meanwhile, the University of Michigan researchers found the developer’s numbers inflated the performance of the ESM.

However, Singh stresses what Case Western researchers demonstrated in the larger focus of their study is important. “Even with a prediction model that has modest performance ... when you link that model to an intervention at a place where your median time to antibiotics is three hours, you can actually improve that time. That is not insignificant,” Singh says. “They linked the model [to an intervention]. They only used it in the ED, they only sent one alert when the model fired, and they actually delivered the model output directly to a pharmacist and gave the pharmacist leeway over whether to prescribe antibiotics based on a pathway that they had set forth.”

Tarabichi explains he and colleagues were intent on leveraging pharmacist expertise in their sepsis response because of the crucial role pharmacists play in ensuring the right antibiotics are selected in each case and in accelerating treatment. “Their ability to make sure it is the right drug at the right dose, and that the delivery method is the correct one, is the first part of that,” Tarabichi explains. “The second part of that is their ability to quickly get the antibiotics ready and administered.”

Both research groups agree that when tested through a highly robust study design, the ESM was effective when it was linked to Case Western’s intervention. But they also agree that when selecting an early warning tool, the solution must be properly vetted. “We should ask for better data and better studies,” Tarabichi argues. “We don’t give people medicines because they look like they worked in hindsight.”

Singh adds it is important to look beyond the topline data from developers. “When you are faced with choosing one of several [early warning] models, you shouldn’t necessarily settle on the one with the highest numbers. You’ve got to see how the models actually perform,” he says. “I think most clinicians would agree that deciding that a patient needs antibiotics after they have already received antibiotics is probably not all that helpful.”

Further, given the fact developers of such tools evaluate differently, clinicians should push for the health system to vet developer claims more consistently. “Overall, I think it is a positive development that we have an effective intervention [in the Case Western approach], but that doesn’t mean the developers are off the hook,” Singh says.

Emily Barey, RN, MSN, director of nursing informatics at Epic Systems Corporation, says that when validating the ESM, Tarabichi and colleagues used a different definition of sepsis that generally is more accepted by the industry than the definition used in the University of Michigan study. She also underscores the positive results Case Western investigators observed in their intervention arm.

“Any improvement in sepsis care is a good improvement, but being able to do it without increasing resources is another contribution this study will make to healthcare organizations considering using AI to improve their care quality,” Barey says.


  1. Tarabichi Y, Cheng A, Bar-Shain D, et al. Improving timeliness of antibiotic administration using a provider and pharmacist facing sepsis early warning system in the emergency department setting: A randomized controlled quality improvement initiative. Crit Care Med 2021; Aug 20. doi: 10.1097/CCM.0000000000005267. [Online ahead of print].
  2. Wong A, Otles E, Donnelly JP, et al. External validation of a widely implemented proprietary sepsis prediction model in hospitalized patients. JAMA Intern Med 2021;181:1065-1070.