From Observation to Measurement: How Research Gets Started

By Howell Sasser, PhD. Dr. Sasser is a scientific review coordinator with Manila Consulting Group, and an adjunct member of the faculty of New York Medical College. He reports no financial relationships relevant to this field of study.

Editor's Note: This is the first in a three-part series about the design and conduct of clinical research. It is not meant to make an expert of anyone, but it is intended to demystify the research process and perhaps make the reader a more astute consumer of the clinical literature. An understanding of the process by which it is created is a very useful tool in judging the quality of the results that it produces.

It would be fair to say that nearly all formal clinical research begins before it is even recognized as such. Observant clinicians note patterns, or what appear to be patterns, in those they treat. These may be broad ("there appear to be more women than men with lupus") or specific ("it appears that those taking a statin and niacin for cholesterol control do better than those taking a statin alone"). Many such observations are fleeting, or turn out to be spurious. Those that survive to be acted on must be recast as questions that can be tested.

These questions are sometimes called falsifiable propositions because they are couched in terms that allow them to be either proven or disproven. So, rather than ask simply whether patients do better on Drug A than on Drug B, we might ask whether patients taking 50 mg of Drug A twice a day for 3 weeks have a lower rate of relapse (to be defined as a specific percent difference) than those taking 100 mg of Drug B once daily for the same period. This gives the designer of the study clear guidance as to what the study procedures should be, and gives the study statistician the information necessary to determine how to assess the study results and also to calculate the number of participants needed to ensure that the study results are statistically sound.

As this example may suggest, the production of questions is an especially important part of the research process, since the way the question is phrased plays a role in how a subsequent study is designed to evaluate it. This, in turn, affects the quality of inference that can be drawn from the study's results. A study that answers the wrong question, or that produces inconclusive results, can be worse than no study at all.

Although the questions that are the nucleus of clinical research cover every imaginable disease process and therapeutic strategy, they can be grouped into a small number of generic categories. As described here, the categories are most applicable to experimental studies (i.e., those in which the investigator manipulates the exposure of interest, usually by determining who receives it and in what manner). However, they can easily be adapted to observational studies (i.e., those in which the investigator observes and records the characteristics and outcomes of a population, but does not intervene to alter the distribution or intensity of potentially protective or harmful exposures). At the risk of reductionism, the categories might be described as better than nothing, as good as what we have, and better than what we have.

Better than nothing: At times, there is no effective, or at least generally accepted, treatment available for a condition. In such situations, there is not necessarily any scientific or ethical objection to comparing an experimental therapy to existing supportive or palliative treatments, to a placebo, or to nothing. The common thread is that there must be a comparison. Simply trying a therapy in a series of patients and reporting the results leaves unanswered the question of what those patients' outcomes would have been had they been treated in any other way. If those outcomes would not have been meaningfully different, what claim can we make about the effectiveness of the new therapy? The value of using a placebo in a group of "control" patients, or at least of trying to be systematic about what non-curative treatments they receive, is that it makes the later statistical comparison cleaner. The fewer the unknown and uncontrolled factors, the clearer the inference about what effect the experimental treatment had.

For example, Barrett and colleagues compared echinacea with a placebo for the treatment of recent-onset cases of the common cold.1 They recruited 719 people between the ages of 12 and 80 and assigned them randomly to receive a pill containing echinacea and labeled as such, a pill containing echinacea but not identified (a "blinded" group), an identical-appearing pill containing only inert ingredients (a placebo, also blinded), or no pills at all. The study's effects were assessed by measuring the time to resolution of symptoms in each group. There was a trend toward shorter duration of symptoms with echinacea, but this effect was not pronounced enough to be statistically significant.

As good as what we have: Whenever an existing treatment is available that is judged to be efficacious — even imperfectly so — the use of a placebo-controlled design becomes ethically suspect. Withholding a treatment with known benefit for the purpose of scientific observation is almost always impermissible. When an effective treatment is already in use, the experimental focus turns to how a new treatment performs in comparison with it. In some cases, the goal may be to show that the new therapy is as effective as the existing one. If the new treatment has fewer side effects, or is easier to administer, or is less expensive, comparable efficacy may be all that is required. Studies with this goal are sometimes called bioequivalence trials. Because showing precisely the same effect with two or more agents is unlikely, a range is defined within which observed effects are understood to be functionally equivalent, even if not identical. This is a statistical process, but it is driven by clinical considerations.

Studies of this type are still uncommon in the complementary and alternative medicine (CAM) literature, in part because they typically require very large study populations and, as a consequence, are often very cumbersome and expensive to conduct. An example from elsewhere in the literature is a study by Baruch and colleagues assessing the relative accuracy of calculated and directly measured low-density lipoprotein cholesterol (LDL-C).2 The calculated method is less expensive and may have greater validity as a measure because much of the published literature on LDL-C is based on it. A group of 81 participants had LDL-C measured simultaneously by both methods, and some had follow-up measurements made as well. All pairs of measurements were included in the analysis. The results showed that while the two methods were highly correlated, there was a difference of adequate magnitude between them in enough cases to question whether they are equivalent, and whether the direct method can fairly stand in for the calculated method in cases where the latter is not feasible.

Better than what we have: This is perhaps the most familiar situation in current clinical research practice. Because many conditions have therapies or clinical management strategies, many new agents and approaches are tested against the existing standard of care. The usual goal is to show that the new therapy is "better," though how much better it must be, and indeed how better is defined, is case-specific and may even change over the course of a single study. The key issue with this sort of question is clarity and precision in what is being compared.

An example of this is a 2002 study by Targ and Levine comparing standard group support and a CAM-based intervention for women undergoing treatment for breast cancer.3 A group of 181 women were assigned randomly to the two interventions and assessed on a number of measures of psychological well-being after 12 weeks. Those in both therapeutic arms showed meaningful before–after improvements, and those in the CAM arm showed greater improvement on some measures. The investigators concluded that the interventions were similar in their effect, although this was not a formal bioequivalence trial.

As is implied in the descriptions above, the main question spawns a series of subsidiary questions: How much "better" is enough? As compared to what? Under what set of clinical or demographic or social conditions? The answers to these questions help to flesh out the design of a study that may produce results bearing on the main question. The importance of asking — and answering appropriately — the questions that guide the design of a study cannot be overemphasized. Careful planning at the beginning of a research project improves the probability of producing clinical and statistically relevant results later.

The next article in the series will deal with how the study question connects with one of the available study designs.


1. Barrett B, et al. Echinacea for treating the common cold: A randomized trial. Ann Int Med 2010;153:769-777.

2. Baruch L, et al. Is directly measured low-density lipoprotein clinically equivalent to calculated low-density lipoprotein? J Clin Lipidol 2010;4:259-264.

3. Targ EF, Levine EG. The efficacy of a mind-body-spirit group for women with breast cancer: A randomized controlled trial. Gen Hosp Psych 2002;24:238-248.