Special Feature

Comparison Groups and Generalization

By Kenneth L. Noller, MD

During the past several years, i have tried to draw attention to the extreme importance of study design in clinical investigation in my comments to articles, and in a few of these special features. In this piece I want to focus on aspects of study planning and interpretation I have not previously covered.

There are 3 important questions to ask each time you read a paper:

1. What type of study is this?

2. If comparison (control) groups are used, are they appropriate?

3. Are the authors’ conclusions appropriate, or have they generalized results that are applicable only to a small group?

There is a fourth important consideration that is largely beyond the ability of most of us to decipher—the appropriateness of the statistical analyses. In general, we must rely on the journal to have considered this aspect of the publication.

In a previous piece I dealt with the various types of studies that are commonly presented in the medical literature and their relative ratings. In this article I will deal with the concept of comparison (control) groups and generalization of results.

The Myth of Randomization and Controls

There is no doubt that the Random Clinical Trial (RCT) is the most powerful type of study currently published in the medical literature. In this type of study the investigators recruit study participants and, typically, assign them either to treatment or nontreatment (placebo agents). In many chemotherapy studies patients are randomly assigned to a new drug protocol or a standard drug protocol. Various versions are possible. Although there are occasional mistakes made in randomization, it is a rather simple task that can be accomplished by the use of any table of random numbers, whether printed or generated by a computer. The more important aspect of random assignment is recruitment of the study participants who will be randomized. In virtually all cases, the group to be randomized is not representative of the general female population. For example, if a chemotherapy agent is to be evaluated, perhaps only women with ovarian cancer would be eligible for recruitment.

Another subtlety is that those women who volunteer to be in a RCT may not be representative of all women with the disease. For example, if an ovarian cancer chemotherapeutic regimen is to be tested, it is possible that only women who were feeling better (or worse) or women who were highly educated (or less well educated) would tend to volunteer. The point is that there are multiple ways to invalidate a randomized trial. Since results of such RCTs might result in new medications (for example, antibiotics) being widely used (and not just in patients with similar histories to those in the trials), the chance that error might result in appropriate drug usage is significant.

The other type of study which directly compares groups is the case-control study. This is an efficient and widely used study design methodology. A group of women with a given disease or a given exposure, or a given surgical outcome, are compared to a group of women without the disease, exposure, or surgery. In almost all cases, there is no problem establishing the "case" group. For example, an investigator might choose to study the differences between women with and without endometrial adenocarcinoma. The group of women with the disease is easy to establish, although even there the investigator must be aware that bias can be introduced (for example, the patient seen at a tertiary referral center might not be representative of all women with endometrial adenocarcinoma). The real problem in a case-control study is the development of an appropriate comparison (control) group. In our example, what would be the appropriate comparison group to women with endometrial adenocarcinoma? Certainly, there should be some age matching as younger women virtually never develop the disease. The investigators must struggle with many decisions regarding matching. In general, it is better to match on very few variables because no statement can be made regarding a matched variable. For example, if those with and without uterine cancer were matched for the use of hormone replacement therapy (HRT) it would not be possible to draw any conclusion regarding the role of HRT since it would be present to an equal extent in both groups.

I have never encountered a "perfect" control group in any case-control study. It is virtually always possible to quarrel with some of the decisions made by the investigators in developing their controls. However, thoughtful investigators can come close, and can adjust for some variables with appropriate statistical analyses. Overall, I like the case-control study design, but know it has limitations.

Generalization of Results

Let’s assume that you have carefully reviewed an article, you have determined the type of study that is being presented, you have carefully examined the comparison group and found it satisfactory, what then remains? The answer, of course, is that the results of the study must be carefully applied to the appropriate population base when drawing conclusions.

Perhaps an example would be helpful to make my point. Recently I was asked by a major OB/GYN journal to review an article in which the stated purpose of the study was to determine whether it is appropriate to follow women with minimally abnormal Pap smears. It was a cohort study, and followed the women in the study population for 3 years. They concluded that it was inappropriate to follow women with minimally abnormal Pap smears because so few return for follow-up visits. Their blanket conclusion was applied to all women. However, the study population consisted of extremely poor, itinerant women in a rural setting. Many of the women did not speak English and had virtually no education. It was clearly inappropriate for the authors to conclude that longitudinal follow-up of women with minimally abnormal Pap smears was inappropriate. Rather, they should have concluded that such follow-up of poor, itinerant women, and poorly educated women in a rural setting might not be managed appropriately with cytology follow-up. That conclusion might also be true for rich urban women, but the authors did not study that segment of the population

Because so much emphasis recently has been placed on study design, I am finding that over-generalization of study results is the most common major error in published manuscripts.