The trusted source for
healthcare information and
Randomized, Controlled Trials—Strengths and Weaknesses
By Jun Takezawa, MD
The randomized, controlled trial (RCT) is believed to provide the strongest evidence for verifying both effectiveness and ineffectiveness of a given treatment. Once the RCT judges the proposed treatment as ineffective, it is rare that the treatment is ever evaluated again. However, because of the various critical limitations inherent in RCTs, profound caution is required to interpret their results. The risks of misinterpretation of any RCT are considered here, using RCTs on the acute respiratory distress syndrome (ARDS) as an example.
Current diagnostic criteria such as chest x-ray1 and impairment in pulmonary oxygenation [eg, P(A-a)O2] allow for serious diagnostic inaccuracy. Because positive end-expiratory pressure (PEEP) and/or prone positioning can easily affect P(A-a)O2, the ARDS patients who respond to the above manipulations can be intentionally allocated to the treatment group, which leads to selection bias, especially when the trial cannot be blinded. The diagnosis, which tries to clarify the mechanism of the disease/syndrome and the parameter used to evaluate the effect of the treatment/intervention on patient outcome, should be clearly differentiated. Chest x-ray reading, oxygenation, CO2 elimination, or pulmonary mechanics may help investigation of the mechanism/process of ARDS but may not stratify the severity of ARDS in terms of mortality.
The mortality of ARDS has been reported to be 30-60%. However, the RCTs reported by the ARDS network enrolled patients with high predicted mortality of more than 75% as estimated by the APACHE III scoring system.2-5 When mortality is taken as a primary end point and the patient predicted mortality is high enough, a vast number of patients are required to prove its effectiveness, if it is truly effective. Additionally, in spite of their high predicted mortality, the actual mortality of the ARDS network patients was 30-40%, in both control and treatment groups. It can be interpreted either that superimposed development of ARDS improves the outcome of the original disease or that the original standard treatment for ARDS was much more effective than the proposed treatment.
The clinical entity of ARDS comprises a mix of cases, and the advantageous effect of the given treatment on a certain subgroup of ARDS patients could be obscured by the presence of other subgroups who do not respond to the treatment. ARDS can develop through various etiologies, with sepsis, pneumonia, chest trauma, and massive transfusion as common examples. Steroids offered to ARDS patients (with the ARDS due to a mixture of causes) was reported to provide no benefit. There is little doubt and truly no argument that where sepsis and bacterial pneumonia are involved in the development of ARDS, steroids are harmful. However, if ARDS develops as a result of chemical pneumonitis and some types of interstitial pneumonitis, steroids may well play a role in improving the outcome. Therefore, although the subgroup of ARDS patients responds to the given treatment, this subgroup, when included in the whole group, may not be judged as responding to the treatment because of its small effect size. Therefore, patient selection and/or composition are important to interpret the results, especially when a negative result is obtained.
Magnitude of Significance
When the risk reduction is very small yet still significant, a large number of patients is required to prove its effectiveness, and as a result, the number of patients needing to be treated (NNT) becomes overwhelming. Thus, even though the proposed treatment is verified as effective, it will never be considered feasible from the viewpoint of cost-effectiveness. Therefore, during the interim, the NNT should be calculated to determine whether the trial should be continued.
Prognosis Predicting Scoring Systems
Prognosis predicting scoring (PPS) systems such as APACHE and SAPS fail to predict the mortality of ARDS. Thus, even though the difference in APACHE or SAPS scores may be insignificant between the control and treatment groups, this does not indicate that the severity of ARDS is identical between the groups. In other words, comparison could have been made between the different patient populations. As far as mortality being taken as an end point, PPS of ARDS must be adjusted to stratify the severity of ARDS. Thus, when the risk of ARDS is adjusted, a different result can be obtained.
In order to eliminate selection bias, the patients are randomly allocated to make the known and unknown confounders distribute equally in both arms. Although the mean airway pressure (mPaw) is known to be related to pulmonary oxygenation, the RCTs, which compared pulmonary oxygenation during airway pressure release ventilation (APRV) or high-frequency oscillation (HFO) with that during a conventional ventilatory mode, were conducted at different mPaw.6,7 In other words, the known confounder (mPaw) was not distributed equally between the groups. This comparison was made between the different patient populations, and therefore, the result cannot be verified. In the case of the ARDS network study on low tidal volume strategy,3 ventilator-delivered tidal volume (VT; VTvent) was restricted to be less than 6 mL/kg in the treatment group and 12 mL/kg in the control group, using the assist-control mode. Accordingly, peak inspiratory and plateau pressures (PIP and Pplat, respectively) were lower in the low VTvent group. However, total VT (VTtot), which was composed of both patient VT (VTpt) and ventilator VTvent, were identical (see Table 1). When VTvent is restricted to a small volume, the patient has to provide more effort to obtain the previous VTtot, which forces the patient to increase respiratory rate and/or inspiratory effort. (Actually, both RR and MV were increased in the low VTvent group). During partial ventilatory assist, transpulmonary pressure is responsible for distending the lung, as well as for developing ventilator-induced lung injury. Therefore, although VTvent was preset in the 2 groups, actual transpulmonary pressure was similar or different between the groups.
Comparison Between Low VT and High VT Strategy3
|Low VT||High VT|
|APACHE III||81 ± 28||84 ± 28|
|VTtot||676 ± 119||665 ± 125|
|Minute Volume||13.4 ± 4.3||12.7 ± 4.3|
|Respiratory Rate (Day 3)||30 ± 7||17 ± 7|
Racial and Cultural Differences
Although most RCTs have been conducted on Western patient populations, direct extrapolation of the results to other races has not been guaranteed. Some RCTs showed different results between Western and Asian populations. For example, an RCT on the calcium channel blocking agent nitrendipine, which was developed for treatment of isolated systolic hypertension, was conducted in China (Syst-China) and Europe (Syst-Euro) on their respective populations and reached different results (see Table 2).8,9 The incidence of stroke was reduced by nitrendipine in both races. However, this treatment reduced all-cause mortality of isolated systolic hypertension in older Chinese people but not in European people.
|Results of Syst-Euro and Syst-China Studies10,11|
|Age||> 60||66.5 (mean)|
|Baseline systolic BP||160-219||172 (mean)|
|Number of patients||4395||2394|
|Nitrendipine dose||10-40 mg/d||10-40 mg/d|
|Stroke incidence||-42% (P = .003)||-38% (P = .01)|
|Cardiovascular mortality||-24% (P = .07)||-39% (P = .02)|
|All-cause mortality||-14% (P = .22)||-39% (P = .003)|
Another example is a prospective cohort study on anticoagulant (warfarin) therapy for prevention of cerebral events due to atrial fibrillation (AF), which was conducted in the European and Japanese populations.10,11 In the European study, 214 patients with nonrheumatic AF received warfarin. The incidence of ischemic and hemorrhagic events was monitored for 2 years. The optimal international normalized ratio (INR) during warfarin therapy was 2.0-3.9. No effect was observed when INR < 2.0, and major hemorrhagic events occurred when INR > 5.0. In the Japanese study, where the RCT was conducted on nonvalvular AF patients younger than 80 years old, the patients were allocated to the conventional INR (2.2-3.5) group (55 pts) and to a low-intensity INR (1.5-2.1) group (60 pts). Incidence of annual recurrent ischemic stroke and hemorrhagic complication was monitored, and it was found that the annual incidence of ischemic stroke was similar between the groups. However, the trial was stopped at 658 days when 6 patients died of hemorrhage, which was a significantly higher incidence (6.6%/yr) than that of the low-intensity group. It was concluded that a lower INR of warfarin was required in older Japanese patients to prevent secondary stroke and hemorrhagic complications.
Although a meta-analysis is considered to be equivalent in some ways to a large-scaled RCT, frequent discrepancies are present between them.12 Because RCTs with a small observational number may inherit a risk of uneven distribution of confounders, the evidence is much stronger in the large-scaled RCT than accumulated small-sized RCTs, as in meta-analysis. Another problem of meta-analysis is publication bias. It is well known that RCTs favorable to the authors are likely to be published and those not favorable to the authors are likely to be discarded. If those publications are included for meta-analysis, the result will mislead the readers.
In summary, careful evaluation of any RCT is required to interpret the results. Patient population (subgroup analysis), severity of illness, confounders, end point, clinical feasibility, and cost-effectiveness are to be strictly evaluated. Investigations into the mechanism of the disease should never be confused with the clinical trials.
1. Meade MO, et al. Interobserver variation in interpreting chest radiographs for the diagnosis of acute respiratory distress syndrome. Am J Respir Crit Care Med. 2000;161:85-90.
2. Sirio CA, et al. A cross-cultural comparison of critical care delivery: Japan and the United States. Chest. 2002;121:326-328.
3. The ARDS network: Ventilation with lower tidal volumes as compared with traditional tidal volumes for acute lung injury and the acute respiratory distress syndrome. N Engl J Med. 2000;342:1301-1308.
4. The ARDS network: Ketoconazole for early treatment of acute lung injury and acute respiratory distress syndrome: A randomized controlled trial. JAMA. 2000; 283:1995-2002.
5. The ARDS network: Randomized, placebo-controlled trial of lisofylline for early treatment of acute lung injury and acute respiratory distress syndrome. Crit Care Med. 2002;30:1-6.
6. Putensen C, et al. Long-term effects of spontaneous breathing during ventilatory support in patients with acute lung injury. Am J Respir Crit Care Med. 2001;164:43-49.
7. Mehta S, et al. Prospective trial of high-frequency oscillation in adults with acute respiratory distress syndrome. Crit Care Med. 2001;29:1360-1369.
8. Liu L, et al. Comparison of active treatment and placebo in older Chinese patients with isolated systolic hypertension. J Hypertension. 1998;16:1832-1839.
9. Staessen JA, Robert F. Randomised double-blind comparison of placebo and active treatment for older patients with isolated systolic hypertension. Lancet. 1997;350:757-764.
10. The European Atrial Fibrillation Trial Study Group: Optimal oral anticoagulant therapy in patients with nonrheumatic atrial fibrillation and recent cerebral ischemia. N Engl J Med. 1995;333:5-10.
11. Yamaguchi T. Optimal intensity of warfarin therapy for secondary prevention of stroke in patients with nonvalvular atrial fibrillation: A multicenter, prospective, randomized trial. Stroke. 2000; 31:817-821.
12. LeLorier J, et al. Discrepancies between meta-analyses and subsequent large randomized, controlled trials. 1997;337:536-542.