Fiscal Fitness: How States Cope

The evolution is now: P4P programs undergo 'dramatic shift' from process to outcomes

Pay-for-performance (P4P) programs are undergoing a striking transformation, moving from a focus on processes of care to one emphasizing patient outcomes, cost efficiency, and use of information technology, researchers are finding.

Investigators who surveyed 27 early adopters of P4P to assess evolution of their payment reward systems between 2003 and 2006 found that performance measures used to evaluate and reward physicians and hospitals are indeed, subject to change. For example, the study found a sharp increase in use of outcome measures to reward physician and hospital behavior, and less focus on processes such as keeping mammography screening rates high. In 2003, sponsors representing 59% of enrollees targeted health outcomes but by 2006 94% did. P4P adopters now are basing rewards on such things as whether diabetes patients actually attain healthy cholesterol levels and blood pressure rates and not just whether a doctor has prescribed pills.

"This is a pretty dramatic shift," says Meredith Rosenthal, PhD, the lead author of the study and associate professor of health economics and policy at the Harvard School of Public Health. "These kinds of outcomes are much more meaningful than process measures of quality for capturing health status and predicting important outcomes including major adverse events such as heart attacks and death."

Dr. Rosenthal and her colleagues also found increasing numbers of P4P plans using cost-efficiency measures as a target for rewarding physicians and hospitals. In 2003, they report, sponsors representing 60% of enrollees included cost-efficiency measures as a prominent aspect of their P4P arrangements. By 2006, sponsors representing 92% of enrollees were using cost of care to measure a physician's performance score."

"This is a very significant change," Dr. Rosenthal says. "But it may also jeopardize the credibility of these programs" because physicians may only see them as a means of cost control.

Most programs still continuing

The researchers interviewed respondents for 27 early adopter programs and found that 24 were still offering P4P. Of the three programs that were closed, one was canceled because of a perception that its market share was too small for the payment incentives to influence the targeted physicians. In another, the provider organization that had been at the center of P4P left the health network. And in the third instance, the program was terminated after a three-year pilot and a replacement was not yet in place, although it was expected that one would be developed.

Sponsors of the 24 surviving programs were geographically diverse and varied in size from 52,000 enrollees to 11 million enrollees.

Although primary care physicians continue to be the most common provider type subject to P4P, the inclusion of specialists increased between 2003 and 2006. Cardiologists and general surgeons were the most commonly mentioned specialists included in P4P programs, although gastroenterologists and orthopedists also were mentioned by several sponsors. The biggest barrier to enrolling more specialists, sponsors said, is the lack of appropriate nationally accepted quality measures. Sponsors also said attributing patients' receipt of recommended care or outcomes to specific physicians is more difficult with specialists than with primary care physicians.

For physicians and medical groups, Dr. Rosenthal says, the most commonly targeted outcome measures were intermediate outcomes such as HbA1C, LDL cholesterol, and blood pressure control. For hospitals, complication and in-hospital mortality rates were frequently targeted. With some exceptions, the measures incorporated into early adopters' P4P schemes revealed a focus on chronic illness treatment guidelines and preventive medicine. In particular, all sponsors interviewed by the researchers incorporated indicators of compliance with diabetes or asthma care guidelines, and most also included process measures aimed at recommended preventive services.

Performance measures added

Programs covering 99% of enrollees had increased the total number of measures that were factored into calculating performance bonuses or withholds since 2003. In addition to supplementing measures, respondents representing 33% of enrollees said they had eliminated measures from their programs. Thus, patient satisfaction scores were dropped by three sponsors (17% of enrollees). Two said they dropped the measure because of a lack of variation in scores across providers, while a third dropped it because of the expense of collecting patient survey data. Some payers also reported eliminating these measures because scores were consistently very high: counseling for tobacco cessation, well-baby and well-child visits, mammography, cervical cancer screening, colorectal cancer screening, and combination measles, mumps, and rubella vaccination.

The majority of programs (covering 58% of enrollees) have boosted the pool of money available for performance-based pay, even after accounting for the fact that more providers are often being drawn into the program. P4P bonuses typically were about $1.40 per member per month and ranged from 20 cents to $15 per member per month. Still, despite reported increases since 2003, P4P thus remains a very small portion of total provider payments.

Interestingly, only three of the programs surveyed have been formally evaluated by independent researchers. For one, the evaluation was still continuing with no results available, while another that found improvements in hospital process and outcome measures did not have a control group. The third study found statistically significant improvements in diabetes, mammogram rates, and coronary disease measures compared with a control group, but was not able to determine whether to attribute the improvements to the incentives or to the education and direction that accompanied that program's launch.

Too early to see change?

Several sponsors said it is still too early to expect changes in performance. It also was widely acknowledged that the dynamic nature of P4P) arrangements, coupled with shifts in benefit design, public reporting, and other aspects of the health system, makes identification of P4P's impact challenging, if not impossible. Where some type of evaluation had been undertaken, Dr. Rosenthal says, outcomes were arguably positive. Respondents covering 38% of enrollees reported solid gains, with another 42% finding mixed results, and 20% finding no effect. Clinical areas where improvement was documented included diabetes care, cancer screening, and inpatient cardiac care.

"Lacking strong evidence of impact on quality improvement," Dr. Rosenthal says, "most respondents reported that their programs were sustained by at least one of three motivations. The first is a belief that if P4P is not yet improving quality, it is because they have yet to find the right technical specification and that if they keep tweaking the program, by adding money, coordinating with other payers' P4P programs, or changing the targeted measures, they will eventually formulate an effective program. The second motivator is a sense that even if P4P is not helping improve quality, paying more for higher quality is simply fairer than paying solely for the quantity of services provided. And the third motivator is a desire to use P4P as an intermediary step toward other goals, such as making performance transparent to consumers and purchasers or developing a tiered payment system."

All 24 sponsors still running programs told the researchers they intended to continue with the approach in the near term. Indeed, Dr. Rosenthal says, 14 of them anticipated expanding use of P4P across their provider networks and four explicitly mentioned plans to increase the share of payments to be allocated based on performance.

Is there a long-term role for P4P?

However, she reports, there were two distinct and opposing camps on views on the long-term role that P4P should play. While one group expected that some form of P4P would become a permanent reimbursement system feature, an opposing group took the view that the regulatory P4P model would give way to a health care market where both consumers and payers would be able to distinguish high-value from low-value providers and that high-value providers would be able to command a price premium. Members of that group said they firmly believed in the power of publicly-reported provider performance data. In fact, Dr. Rosenthal says, three sponsors identified performance data transparency as a more important lever for performance improvement than payment incentives. In contrast, she says, four sponsors expressed serious doubts about the public's ability to understand health care quality and efficiency reports.

Respondents identified three major challenges to building and maintaining effective P4P programs: 1) overcoming physician resistance; 2) determining the necessary size of incentive pools to capture providers' attention; and 3) finding resources necessary to continue program funding.

When the researchers ask them what lessons they had learned from their P4P experiences, some respondents focused on the importance of promoting provider involvement as a means of reducing opposition to the programs. The second most prevalent lesson noted was in the area of selecting measures. Thus, four respondents stressed the importance of using clinical rather than administrative data, in part to overcome physicians' concerns about the validity of performance measurement. Four sponsors also cited the need to use only nationally accepted measures such as those approved by the National Quality Forum so that physicians will be satisfied and there can be coordination with other programs.

Five respondents said the most serious threat to P4P sustainability was the absence of a demonstrable return on investment. Although only three sponsors had documented a return on investment, another five said quantifying such a net savings was a future goal.

It's time to evaluate

Dr. Rosenthal says her findings yield several important implications for policy, practice, and research. First, as early adopters have increased levels of payment and migrated to health outcomes and costs as targets, there is a need to add evaluation to program design. "Program evaluation could assess both the impact of the programmatic changes on targeted measures and unintended, adverse consequences," she says. "With increasing focus on outcomes and cost, we also need better risk adjustment to address real and perceived outcomes about patient differences that might undermine quality incentives."

Second, she says, many early P4P adopters have made efforts to strengthen their programs by increasing the size of the incentive pool, although such increases remain modest in terms of the dollars paid out. Some sponsors have also begun to reward improvement explicitly, alongside attainment of benchmark performance levels.

Finally, expansion in the comprehensiveness of measure sets among early adopters means there is less latitude for providers to focus on a single population group or condition to maximize P4P payments. Theoretically, she says, that should motivate more holistic approaches to quality improvement, which are viewed by many as critical to making real progress in achieving national quality goals.

"Despite the increasing scope of measurement, the range of measures and conditions covered by the P4P programs we studied remains relatively narrow," Dr. Rosenthal says, "largely because of the limits of current measure sets. P4P may become important in those clinical domains for which there is, or will be, sufficient evidence to support meaningful progress or outcome measurement. However, it should probably be acknowledged that some areas of medicine, for example, where patients' preferences greatly affect the appropriate course of treatment, may never be well suited for performance incentives."

The research also highlighted developments that could undermine P4P. Dr. Rosenthal says many respondents were focused on making a business case for P4P, a concern that she says is clearly driven by employers that are alarmed at the continued pace of rising health insurance premiums. Although efficiency is an important component of quality, she says, emphasis on reducing the cost of care may ultimately undermine the credibility of these programs with physicians and other stakeholders.

Dr. Rosenthal tells State Health Watch that quality measures have been an issue for physicians since P4P began. "It's hard to sort through what people are really thinking," she says. "Orienting P4P around weak quality measures is problematic." She says she finds it encouraging that there has been a dramatic increase in resources and attention paid to quality measures.

Dr. Rosenthal candidly recognizes that even better quality measures might not make doctors any more receptive to P4P. "No one wants to be measured," she says, "especially when they are found lacking. It is a shared concern, so it is productive to focus on it."

Asked about the relative lack of program evaluations to date, Dr. Rosenthal says evaluating them is as possible now as it will ever be, and that many of the programs have been in place enough time for an evaluation to be done. "Some programs are robust enough that we can assume there has been some effect," she says. "We need a better understanding of what strategies are most effective. I worry that people think P4P is a good idea but don't learn from their mistakes."

While Dr. Rosenthal believes P4P programs can be generalized and applied to different locations, she recognizes that in some markets with monopolies, it may not be possible to negotiate P4P contracts. "The main limitation is the extent of payer fragmentation," she says.

Can P4P be cost-effective?

The biggest threat Dr. Rosenthal sees to future success of P4P programs is the need to be cost-effective. "We are desperate to find ways to reduce health care costs," she says. "It is the central challenge for U.S. health policy. But we are equally desperate to improve quality. And pay-for-performance is saddled with these twin problems. An emphasis on cost could jeopardize doctors' willingness to participate. I wonder if there are other strategies more appropriate for cost control so P4P could be left for quality improvement."

Dr. Rosenthal projects that five years from now P4P programs will look much as they do today. She says they may represent a somewhat higher percentage of total payments to providers. They still will have mostly process measures and intermediate outcome measures. She hopes there will be decent quality measures for a wider range of physician specialties. She anticipates more case rate payments and more groups taking full capitation.

"It will be a good thing if programs are similar to today's and are routine," Dr. Rosenthal says, "as long as they have valid performance measures used for payment. They don't have to be perfect performance measures. It's OK if they are good enough for 10% to 15% of total payment. It's not necessary to put 25% into P4P. It will be good if P4P programs can continue to operate and we can get on with the agenda to create a more sustainable health care system."

The study appeared in the November/December 2007 issue of Health Affairs. Contact Dr. Rosenthal at (617) 432-3418 or e-mail mrosenth@hsph.harvard.edu.