St. John’s Wort and Depression

June 2001; Volume 3; 46-47

Source: Shelton RC, et al. Effectiveness of St. John’s wort in major depression: A randomized controlled trial. JAMA 2001;285:1978-1986.

Design and Setting: Randomized, double-blind, placebo-controlled, multicenter clinical trial. Subjects underwent a one-week, single-blind placebo run-in; those whose HAM-D scores improved by 25% or fell to < 20 were excluded.

Subjects: Two hundred adult outpatients (67.0% female; 85.9% white) with major depression and Hamilton depression (HAM-D) 17-item scale scores of ³ 20.

Treatment/Dose/Route/Duration: Placebo or a standardized extract of St. John’s wort 300 mg (Lichtwer Pharma GmbH) tid for eight weeks; after four weeks, if no effect was seen, the dose was increased to four tablets daily (1,200 mg).

Outcome Measures: HAM-D scores analyzed by a random coefficient regression model that examined differences in linear rate of change. Secondary efficacy variables included the Beck depression inventory, Hamilton rating scale for anxiety (HAM-A) the global assessment of functions scale (GAF), the clinical global impression-severity (CGI-S) and -improvement (CGI-I) scales. An intention-to-treat (ITT) analysis was done; a secondary subgroup analysis included only patients with a baseline HAM-D score of ³ 22. Differences in response and remission rates were examined using Cochran-Mantel-Haenszel tests. Assessments of safety and efficacy were made at the end of weeks 1, 2, 4, 6, and 8.

Results: There were no significant differences between groups in any of the assessment scales. There were significant effects for time but not for treatment or time by treatment interaction. The proportion of subjects who responded was not different between groups. The only significant treatment effect was the number of remissions, which was significantly higher in the treated group (14/98 [14.3%]) than in the placebo group (5/102 [4.9%]). The treatment was well-tolerated. However, 41% of those treated with St. John’s wort experienced headache compared to 25% in the placebo group, a significant difference.

Funding: Pfizer Inc. (which manufactures both pharmaceutical antidepressants and St. John’s wort extracts).

Comments by Adriane Fugh-Berman, MD, and Steven Bratman, MD

Although this was a well-designed trial, the write-up is deficient. The attention to design details is laudable: for example, care was taken to mask the taste and smell of the verum treatment; individual lots of the treatment were analyzed to ensure consistency; a dose increase was built in; and HAM-D assessments were videotaped and reviewed by an independent assessor. In addition, the study population appears to have been selected to represent an appropriate level of depression for comparison with previous St. John’s wort studies: The median entry score of 22 on the 17-item HAM-D is very close to the entry scores seen in, for example, two recent trials, a three-armed trial comparing St. John’s wort to imipramine and placebo in 263 participants,1 and another comparing St. John’s wort to imipramine in 324 participants.2

The presentation of results, however, raises questions about the heterogeneity of the study population. Given that there were 14 remissions in the St. John’s wort group and five in the placebo group, responses must have been quite heterogeneous for the graph of HAM-D scores (presented as unadjusted means) to show the two groups almost overlapping. It would have been nice to see the actual distribution of scores. The mean "duration of current major depressive disorder" was 2.3 years (SD 6.3) in the St. John’s wort group and 2.7 years (SD 5.6) in the placebo group. Assuming that this number refers to time elapsed since the diagnosis of major depressive disorder (if it refers to the duration of the current episode, these patients would be atypically chronic), the standard deviations show that the distribution is dramatically skewed. It raises the question of whether the outliers unbalance the population enough to affect results.

The investigators’ designation of subjects with a Hamilton score of < 22 as less severely depressed is idiosyncratic. Within this trial, this could only include subjects with a HAM-D score of 20 or 21, and it is entirely arbitrary to designate someone with a score of 21 as less severely depressed than someone with a score of 22. The single positive finding in this trial was a higher remission rate among the treated group. Although remission rates were low, the authors should not have dismissed this finding; remission is of clearer benefit to the patient than an improvement in HAM-D scores.

The desire that one’s research will supersede all other research in the field is probably common, but most authors are better at keeping such uncollegial thoughts under wraps. The table entitled "Design limitations of previous controlled trials of St. John’s wort" purports to document deficiencies in design but actually contains mainly opinions, that although framed in table cells are still opinions.

The seven column titles are: Diagnostic practices/ heterogeneity; Less experienced investigators; No standardized symptom ratings; Low depression severity; Small sample size/inadequate power; Low comparator dose/no plasma levels; and Low St. John’s wort dose. Trials are assessed by single checkmarks in the columns. Of the seven criteria, the charge of diagnostic practices/heterogeneity is fair and the claim that this affects almost all previous trials is probably accurate. Lack of standardized symptom ratings is also a fair study limitation (but this only affects 7/31 trials).

The category of "Less experienced investigators" seems both vague and subjective; the only explanation given of how this was assessed is "use of investigators without apparent experience in psychiatry or research." This may refer to a major difference in subject recruitment; in the United States, it is common for a small number of investigators to recruit a relatively large number of patients, often through advertising. In Germany, most studies are performed by a relatively large number of primary care physicians, each recruiting a relatively small number of patients from within their practices. One system has not been proven better than the other.

The designation of "Low depression severity," defined as the inclusion of mildly depressed subjects with HAM-D scores of < 18, as a methodological flaw is peculiar (deliberate inclusion of mildly depressed subjects is hardly a design flaw). "Small sample size/inadequate power" is explained only by saying that "statistical power was too low to detect meaningful differences between groups." This is nonsensical in placebo- controlled trials (12/17 checked in this column); if differences were detected between groups, then power simply is not an issue. The statement could be defended for treatment-controlled trials, but even by the authors’ unknown criteria, this only affects five of 14 treatment-controlled trials.

"Low comparator dose/no plasma levels" (affecting 13 studies) also is unfair, as well as an odd pairing (it is not indicated which charge applies to each study). Although the doses of (primarily tricyclic) drugs used as comparators in some studies are less than those routinely used as starting doses in the United States, the doses were standard therapeutic doses when and where the studies were performed. The majority of trials (including the largest and most recent) used adequate doses of antidepressants. Lack of antidepressant plasma levels should not be considered a methodological flaw when such levels are not routinely required in depression studies.

"Low St. John’s wort dose" (defined as" < 600 mg/d standardized hypericin concentration)" is idiosyncratic. "Standardized hypericin concentration" is meaningless without an indication of the concentration (the usual standardized extract would provide 2.7 mg/d hypericin), but, that aside, lower doses of 350-500 mg/d St. John’s wort extract (containing 0.5-0.75 mg/d hypericin) have achieved positive results against placebo in at least four placebo-controlled trials.3 (Hypericin is only one of the potential active ingredients in St. John’s wort). Although the 900 mg dose has become the standard, this is not based on data. The authors themselves are not wedded to 900 mg, as one of their exclusion criteria was a "prior adequate trial of St. John’s wort (at least 450 mg/d)."

Study duration also is criticized in the text, despite the fact that the current eight-week study is not markedly longer than previous studies (20/31 were ³ six weeks). Finally, one recent study was inexplicably omitted: A 42-day double-blind, placebo-controlled trial of 142 participants with mild-to-moderate depression according to DSM IV criteria.4  

Dr. Bratman is Medical Director and Senior Editor of


1. Philipp M, et al. Hypericum extract versus imipramine or placebo in patients with moderate depression: Randomised multicentre study of treatment for eight weeks. BMJ 1999;319:1534-1538.

2. Woelk H. Comparison of St. John’s wort and imipramine for treating depression: Randomised controlled trial BMJ 2000;321:536-539.

3. Linde K, et al. St. John’s wort for depression—an overview and meta-analysis of randomized clinical trials. BMJ 1996;313:253-258.

4. Laakmann G, et al. St. John’s wort in mild to moderate depression: The relevance of hyperforin for the clinical efficacy. Pharmacopsychiatry 1998;31(Suppl):54-59.