Quality can be hard to define in any arena, and in medicine there can be so many variables that pinning down which hospital is better than another becomes a herculean task. That is one reason hospitals have been bombarded with a slew of quality measures and metrics that promise to distill all those variables into hard numbers and rankings that can be used to assess a hospital’s quality and patient safety.
Once you have those hard numbers, comparing one hospital to another is much easier, right?
That’s the intention, but the reality appears to be much different. Quality measures commonly used to evaluate hospitals and other healthcare institutions are likely to misinform patients, misclassify hospitals, misapply financial data, and cause unwarranted reputational harm to hospitals, says Bradford Winters, MD, PhD, associate professor of anesthesiology and critical care medicine at Johns Hopkins Medicine in Baltimore. He is the lead author of a recent study that found only one of 21 quality measures reliably indicated a hospital’s patient safety profile. The study focused on the Agency for Healthcare Research and Quality (AHRQ) patient safety indicators and the CMS hospital-acquired condition (HAC) measures.
In addition to carrying significant influence on reimbursement, the measures evaluated in the study are used to determine scores in widely publicized rankings such as U.S. News and World Report’s Best Hospitals, Leapfrog’s Hospital Safety Score, and the CMS Star Ratings. A similar study from 2015 also concluded that “there is limited evidence that many ‘quality’ measures — including those tied to incentives and those promoted by health insurers and governments — lead to improved health outcomes.” (For more on those studies, see the story later in this issue.)
The study findings mean the healthcare community should re-evaluate whether these measures should be used to compare hospital quality and safety, Winters says. The way the measures depend on billing codes to identify adverse events is proving to be invalid, he says, primarily because coding is not consistent from one facility to another.
They were originally intended to help a hospital assess its patient safety internally, using the billing codes to identify that the hospital had a high number of respiratory failures or some other issue so the problem could be addressed within the facility, Winters notes. Now these performance measures are used to calculate formulas for denial of reimbursement, but they lack the validity to justify that use, he says.
Winters and his colleagues used 80% as the threshold for validity, based on research that identified 80% confidence as the minimum necessary for a physician to make a clinical decision.
“With hospitals running on very tight margins, to deny reimbursement to a hospital based on a quality measure that is wrong at least 20% of the time, if not 40% to 60% of the time, is ludicrous,” he says. “Hospitals shouldn’t be punished based on a measure that lacks validity. There should be an open discussion about how these incentive programs are going to move forward.”
Frustration Over Safety Scores
The Johns Hopkins research only confirms what many clinicians and quality professionals already knew. It is not difficult to find healthcare leaders who are frustrated over how these quality measures are used to indicate patient safety and quality. One example is the University of Texas MD Anderson Cancer Center in Houston, where Thomas W. Feeley, MD, head of anesthesiology and critical care and head of the Institute for Cancer Care Innovation, points to the highly publicized hospital rankings from U.S. News and World Report. The magazine ranks MD Anderson as No. 1 in “best hospitals for adult cancer,” yet it is given a patient safety score of two out of a possible five.
The second ranked hospital, Memorial Sloan Kettering Cancer Center in New York City, has a patient safety score of 4/5, and the third-ranked Mayo Clinic in Rochester, MN, has a 5/5 patient safety score. Johns Hopkins Hospital in Baltimore, at number six in the rankings, has a patient safety score of 1/5 — a rock bottom score that is hard to reconcile with the hospital’s overall quality and reputation.
Feeley was not surprised by the Johns Hopkins study results showing the measures to be invalid.
“That is totally correct,” he says. “When U.S. News and World Report started reporting numbers that made us look like we’re not a safe hospital, we looked at the data and saw that it’s all administrative claims data. It’s about how well you code, not how safe your hospital is for your patients.”
In response, MD Anderson devoted resources to making sure that when PSIs are flagged someone clinically reviews the cases. Feeley calls that “an incredible waste of money, just driving up the cost of healthcare.” But he says it is necessary to ensure that the data reflects MD Anderson’s quality of care as accurately as possible. MD Anderson’s internal review made clear that the data just don’t show what people think it shows, he says.
“So much of it was related to how we code, and sometimes we coded for billing reasons that aren’t necessarily indications of patient safety problems,” Feeley explains. “Measuring is important, but we’re measuring the wrong things. When you measure from coding data, there inherently will be problems.”
Billing Data Can Mislead on Safety
One example was postoperative respiratory failure. In a move to improve patient safety, many patients in a cancer care center are not extubated immediately after surgery. They’re left on a mechanical ventilator so they can slowly emerge from anesthesia and allow clinicians to make sure the surgery went well. Yet when they get to the ICU, the physicians who see them there document postoperative respiratory failure because that is the best strategy for optimum reimbursement in a fee-for-service environment.
“That doesn’t mean they had some awful thing go wrong,” Feeley explains. “That meant that people were providing really outstanding care. But when that documentation gets to the coder and they note postoperative respiratory failure, that triggers a patient safety indicator. How ridiculous is that, that when doctors are doing a good thing the hospital gets penalized because the measure can’t differentiate between that and the other postoperative respiratory failure that really is an error?”
Quality measures should be developed to focus more on outcomes, Feeley says. Several organizations are working toward that goal, but no wholesale change will come soon, he says.
“As long the scores are coming from coded data, the outcome will be the same,” Feeley says. “Garbage in, garbage out.”
Some Poor Outcomes Inevitable in Healthcare
The use of PSIs and process measures to determine quality and safety is inherently flawed, says Donald E. Fry, MD, executive vice president for clinical outcomes at MPA Healthcare Solutions, a healthcare analytics company in Chicago that has pioneered quality assessment and predictive models.
PSIs are self-reported and have no standardized definitions, while process measures can oversimplify the assessment of what represents quality care, he says. As an example, he notes that there are nearly a hundred variables that determine whether a patient gets a ost-operative infection, but the applicable process measure focuses on only a few. Those few variables, in effect, determine whether the hospital was in compliance, and infection control standards.
“The reality is that the cascading and interactive list of potential factors is such that you can be completely compliant with the things that government is measuring — hair removal, antibiotic administration, and so forth — and you will still have high infection rates,” Fry says. “Similarly, you may fail to comply with all the processes in one or two of the sample cases and still have effective outcomes.”
Fry also notes that the current measures can unintentionally hamper the effort to improve patient safety. The hospital that improves its detection and reporting of PSIs can damage its reputation and reimbursement, he says. More meaningful evaluation of provider performance would come from objective measures that are not self-reported, Fry says. The true measure of whether a facility’s healthcare is suboptimal can only come from comparing performance to that of the collective group, he says, not from self-reported data that isolates the assessment of each hospital. (See the story later in this issue for more concerns about self-reported data, and the story later in this issue for what type of data is most reliable.)
Assessments would be more reliable if they were based on factors such as a risk-adjusted, prolonged length of stay outliers, Fry suggests. If a patient stays in the hospital three standard deviations longer than the normal stay, that is invariably associated with a complication of care that also leads to increased resource utilization and increased morbidity, Fry says.
“One of the unfair things about current public disclosures about hospitals is that hospitals don’t know really know what their outcomes are. Tracking patients after discharge is not easy. For example, 20% to 40% of readmissions occur at a hospital other than where the patient received the index care, and about 40% of emergency room visits after major operations occur at another hospital,” he says. “If somebody dies 90 days after discharge without readmission, the doctors and certainly the hospital don’t even know that has even happened.”
Measuring quality of care with quantitative data will always provide an incomplete and potentially misleading assessment to the consumer, says Peter Bonis, MD, chief medical officer of clinical effectiveness with the Boston office of Wolters Kluwer Health, which provides data, software, and consulting for healthcare organizations. The intention is good, but such measures can never incorporate the many values that go into a hospital’s quality of care and how consumers choose providers, he says.
“Even if things go well and you believe you are receiving high-quality care, you don’t as a lay person have an external benchmark from which to know whether you received contemporary evidence-based care that was optimized to you as an individual,” Bonis says.
Recent moves to tie quality measures to reimbursement will push hospitals to address the accuracy of quality measures more directly, Bonis expects, through the use of more chart review and other means of validating that the data produces a fair assessment of the hospital’s experience. (See the story later in this issue for more on how hospitals can strive for more accurate portrayals.)
“Right now there is not a whole lot of rancor about the question of accuracy because there was not much financial risk tied to the scores. Hospitals wanted to be ranked well, but there was always doubt about how much consumers really used this information to choose their healthcare providers,” Bonis says. “But now with value-based care and risk-based contracts, I think we’re going to see more disputes from hospitals as to whether or not the way they are being scored is correct.”
- Peter Bonis, MD, Chief Medical Officer, Clinical Effectiveness, Wolters Kluwer Health, Boston. Telephone: (781) 392-2088. Email: firstname.lastname@example.org.
- Thomas W. Feeley, MD, Head of Anesthesiology and Critical Care, Head of the Institute for Cancer Care Innovation at MD Anderson Cancer Center, Houston. Telephone: (713) 792-7115. Email:
- Donald E. Fry, MD, Executive Vice President for Clinical Outcomes, MPA Healthcare Solutions, Chicago. Telephone: (312) 467-1700. Email: email@example.com.
- Bradford Winters, MD, PhD, Associate Professor of Anesthesiology and Critical Care Medicine, Johns Hopkins Medicine, Baltimore. Email: firstname.lastname@example.org.