Generic tools measure quality of life outcomes

Accountability must include social impact of care

John E. Ware Jr., PhD, is our guest this month. He is known internationally as development director of the SF-36 and the SF-12, the most widely used patient-based health surveys. For nearly 30 years, Ware's research focus has been the quantification of health outcomes. For the past decade, he has served as senior scientist and director of the Health Assessment Lab at the Health Institute of the New England Medical Center in Boston. Previously, he worked at the RAND Corporation in Santa Monica, CA.

Ware is also the founding president of QualityMetric Inc. This Lincoln, RI-based firm is dedicated to designing the next generation of patient-based measures of health outcomes.

Q. We're seeing a trend toward measuring the clinical outcomes of health care with tools like the Joint Commission on Accreditation of Healthcare Organizations' ORYX Plus.

How accurately do those reflect the effectiveness of patient care as compared to patient self-assessments like the SF-36?

A. Usually the word we use is valid as opposed to accurate because it's not just an issue of the precision of clinical measures, but whether they are measuring all of the right things. The paradigm shift represented in the outcomes movement is not just an emphasis on results. It's a shift in the definition of what we're going to hold the health care system accountable for.

We now view specific clinical outcomes - orthopedic surgery that improves knee rotation or an inhaler that improves lung function - as inputs to an equation for improving human functioning in everyday life. This equation views outcomes, not in terms of specific organ functioning - Does my knee function? Does my lung function? - but it defines outcome in terms of human function and well-being.

Functioning refers to what people are able to do in everyday life. Well-being refers to how they feel. So from this point of view, we would view the current array of traditional clinical outcomes as being very important but incomplete. They tell the doctor and the payer whether treatment is having the desired, specific results. But that doesn't tell us the social value of the treatment.

We can know the latter from patient-based assessments of functional health and well-being, using measures of generic outcomes like the Sickness Impact Profile, the Duke Health Profile, and the SF-36 short-form health survey, which are available from the Medical Outcomes Trust on the Internet [].

So, I would argue that the traditional clinical outcomes database needs to be supplemented to monitor outcomes in the terms that matter most to the public and to employers. But we shouldn't monitor generic outcomes instead of traditional clinical endpoints. We really want to know both answers.

We want to know which treatments have their desired effects. As a society, we can't pay for every treatment. Therefore, we need data about generic outcomes to know which treatments have the greatest social value.

Q. How critical is it that we start looking at those measures of functional health and well-being as well as the clinical outcomes?

A. I don't think you could exaggerate how critical it is to do this. First of all, we have no way of comparing the burden of different diseases using specific clinical measures. For example, I can't compare someone's lung function with someone's heart function, or with someone's knee rotation. So we need generic measures - that are not specific to any one disease or treatment - for use as a common denominator for such comparisons.

There really are two big questions: First, what is causing the most morbidity? Or, what are the diseases and conditions that are disrupting life in America the most? We need generic measures to answer these questions.

The second big question is, where is the biggest bang for the buck? If we can fix or replace most organs whenever there's something wrong, then obviously we must identify and invest in those treatments that have the most impact.

To compare the benefit of different treatments, we again need a generic metric. We also need generic tools to inform medical decision making, to improve the cost effectiveness of care - one treatment, one patient, one decision at a time.

The word is out that we can't possibly meet our financial objectives for health care without providing treatment more rationally. For some people, that means rationing. It means withholding treatment sometimes for some people that could benefit from treatment because there are more cost-effective ways to use those funds. So it is critical that we have an information system that will help inform those difficult decisions.

Adding the generic tools is a big step in that direction. They're not the only part of the database, but they're a very important addition to the clinical and economic information that is already in the database.

Q. With generic tools like the SF-36 and the SF-12, have we hit on the right means to tap these functional health and well-being indicators?

A. I think it's correct to say that the SF-36 and the SF-12 are the most widely used generic health outcomes tools, not only in the United States but throughout the world.

From the time it was first made available in early 1989 through the end of 1996, there were 450 publications about the SF-36. But in 1997 alone, the number increased by two-thirds, to over 750.

There are now one or more studies for over 100 different conditions. There are more than 150 longitudinal studies of treatment outcomes. It's very broad literature. That proves unequivocally that by using standardized questionnaires, we can get valid data from the great majority of patients.

A lot of people have argued that patients say whatever they want on these forms, and the data are not reliable and valid. The evidence is now overwhelming that, when properly used and standardized, these kinds of tools can add a lot to the database and results can be compared.

Q. What are the political barriers to preventing wider use of these kinds of tools?

A. I think one barrier is resistance due to a lack of familiarity with the new patient-based measurement tools. It's not that doctors don't understand functional health and well-being. Some of them understand it very well. It's just that they're not used to measuring them to achieve reproducible scores. In other words, they're not used to standardized tests for health outcomes. They lack confidence in these generic tools, and they view their results as soft data. Another political issue is that generic outcomes tools transfer some of the authority in health care to the public, to the consumer. We used to go to the doctor to find out how we're doing. Now we go to doctor to tell him or her how we're doing, and society is judging the doctor on what patients say.

This is a political issue not only in health outcomes assessment, but also with respect to measures of patient satisfaction which focus on care and services. Many providers don't even like the term consumer. And many don't like to be called providers. Regardless, the public's judgment of the acceptability of health care delivery has become one "bottom line" in defining health care. That creates a political problem because it shifts the expertise to the consumer.

The auto industry had a similar problem decades ago. It survived, and I think the health care system will also. Like cars, health care will be greatly improved when we listen to the voice of the public.

Q. Are generic measures like the SF-36 and the SF-12 sufficiently standardized to avoid the kind of debacle we often see where, for example, Hospital A and Hospital B measure outcomes differently, so comparisons are meaningless?

A. We recommend a standardized form, which is clearly useful. It helps a lot to make it available royalty-free. That removed a practical barrier to its use.

We also know it's important to analysis data in a highly standardized way. It isn't enough to standardize the instrument and the scoring algorithms; we also have to standardize the display and offer a meaningful interpretation. That's equally important. So, it begins with standardizing the measure, but we have to carry that standardization out much further.

There's a tremendous amount of variability in outcomes. We have to standardize the definition and we have to apply a valid risk adjustment to level the playing field when outcomes are compared. Otherwise, comparisons won't be fair, and results will be confusing.

Q. Is such confusion still present in the analysis of the SF-36 and the SF-12?

A. We're pleased with how consistently most steps are being performed. One approach that helped was putting a diskette in the SF-36 user's manual with scoring algorithms and a test data set. As a result, almost everyone uses the same scoring so we now have a great deal of comparable data. It wouldn't be comparable if each step in scoring wasn't performed in exactly the same way.

The big issue now is whether SF-36 forms are used in a well-designed monitoring system. For example, some hospitals administer an SF-36 after people are discharged, but they don't have a baseline to compare them with. Others measure patients before and after treatment, but they don't have a control group or norms for changes over time.

In addition to reliable and valid measures, it's important that the design for monitoring patients be scientifically sound. Having a valid measure doesn't guarantee that you'll have an interpretable result. The literature includes very good examples of how to do this.

One of our approaches to try to help this along is an accredited continuing medical education multimedia series titled Understanding Health Outcomes. It's accredited for physicians, clinical pharmacists, nurses, and physician's assistants. The first nine programs are helping people understand how advanced this field is and what the concepts and tools are.

The SF-36 is not the only good tool. The series covers many good tools, both generic and specific. This training program, with videos, study guides, and tests, is helping people understand that these tools have advanced considerably. The tools will continue to advance, and people need to do their homework to select and use them properly.

Q. You mentioned that we are going to have to ration health care, and we need tools to tell us where to get the biggest bang for the treatment buck. Could you explain how generic tools could guide those choices?

A. We've all heard about small-area variation in utilization and surgical rates, and variations in practice style. It you haven't looked at the Dartmouth Atlas, you should. (See information box at the end of this article.) It's impressive how much totally illogical variation there is in the utilization of medical care services.

What we'll discover next is an equally large variation in health status and in health outcome, which I would define as a change in health status over time. We know that the variation in health outcome cannot be explained in terms of measurement error. The variations are real. The information system we need to manage these variations is one that tells us about the costs of health care and the health benefits at the level of an individual patient.

This management information system should include the best clinical information as well as highly standardized information about what's going on in the patient's life and health care costs. This is important because we know from clinical trials that even within a homogeneous group of patients (with the same diagnosis and a narrow range of disease severity) a substantial proportion will have a health outcome that is no different from the untreated placebo group.

When to provide or withhold treatment

Managed care should be about providing treatment when it's clearly going to make a difference and withholding treatment when it's not. That means we need to be able to predict, as well as possible, which patients are - and which are not - going to benefit from treatment. That calculation is going to have to be made one patient at a time.

For example, there is a new drug treatment (Cilostazol) that is being evaluated for patients with intermittent claudication (occasional limping caused by interrupted blood supply from narrowed arteries). The treatment helps most patients increase their walking distances and improves health-related quality of life as measured by the SF-36. But, there is also an unknown cardiovascular risk. Is that risk worth taking? How much of a benefit is the patient getting?

To answer those important questions, we need an information system that can improve medical decision making at the individual patient level. It must include cost, clinical benefit, and the effect on the patient's life. This could revolutionize the way we deliver health care, and the way we set and evaluate treatment priorities.

We want an information system for monitoring at a population level, such as everyone who has had a heart surgery, and we also want a system we can use at the individual patient level. We're going to manage care one patient, one treatment, one decision at a time.

There's no doubt that the SF-36 and other generic tools are going to complement the clinical and economic data for purposes of better health care decisions. The voice of the public has been under-represented in decisions up until now. Too many decisions have been made on the basis of economic data and sometimes on the basis of economic and clinical data.

Some think of health care as a three-legged stool: clinical, economic, and patient perspectives. The stool is going to tip over without that third leg. The third leg is basically how the public is judging the benefits and acceptability of the way we deliver care. Better patient-based measures will fundamentally change the way we deliver health care and the way we define the benefits of health care.

After the current concern about quality is over, health care will never be the same. We'll never deliver health care again without an information system that represents well the needs and expectations of the public. That's why we need these generic tools.