Annals of Emergency Medicine
Volume 36, Issue 3 , Pages 233-236, September 2000

Dueling Meta-Analyses

Department of Emergency Medicine, University of Florida Health Science Center, Jacksonville, FL

Article Outline

Abstract 

[Wears RL. Dueling meta-analyses. Ann Emerg Med. September 2000;36:234-236.]

 

See related articles, p. 181 and p. 191 .

The replication of others’ results is a fundamental principle of the scientific method, yet we rarely see replication even attempted, much less reported in medical investigations. In this issue of Annals , the meta-analyses by Rowe et al1 and Alter et al2 on the use of magnesium in the treatment of acute bronchospasm offer a rare opportunity for independent, direct comparison of research methods and results. As is the case with most meta-analyses, the differences between studies are more interesting than the similarities.3 It is illustrative to compare these articles, particularly in light of a recent call for standardized reporting of meta-analyses.4 There are 3 general areas of difference: the studies selected for inclusion, the assessment of study quality, and the analysis of the results.

With similar questions, sources, and criteria, we would expect to find largely the same studies included in each meta-analysis. Both groups cast their nets widely to try to retrieve all relevant studies, both offered detail on the articles rejected, but in the end, they came up with slightly different sets of studies. Rowe et al1 from the Cochrane group, accepted 7 articles, and Alter et al2 accepted 9. One article included by Alter et al dealt with chronic obstructive pulmonary disease, so its exclusion from Rowe et al’s set, which was limited to asthma only, is appropriate. The other’s absence is puzzling, because it appears to be relevant and is not excluded by any of the stated exclusion criteria. It does not appear on the list of excluded studies (provided in the electronic version of the paper in the Cochrane Library), and was discovered by Alter et al’s group via author consultation, not by database searching.

This illustrates 3 important points. First, it is extraordinarily difficult to be certain that all the available evidence has been obtained when evidence on a clinical question is being sought. This has important implications for advocates of evidence-based medicine. In this movement, great emphasis is placed on the teaching and practice of critical appraisal, but relatively little attention has been paid to the problem of finding the relevant evidence to begin with. Second, it emphasizes the importance of prospective registries of randomized trials as a method of obtaining information about this “gray literature.” Annals participates in the Controlled Trial Registry (as do many other journals), but trials only become registered by this mechanism after submission of a report. Unsubmitted data, or data initially published as an abstract but never followed by a full article, are generally unregistered. This is a serious ethical issue, since in even the most benign of trials, human subjects are placed at some potential risk (eg, they might be assigned to the study arm with the worse outcome). The justification for this risk is the potential contribution to scientific knowledge, but if the information gathered is not accessible in some form, that justification disappears. Third, automated searching of databases such as MEDLINE or EMBASE is not sufficient to ensure a comprehensive assembly of the relevant evidence. Fortunately, in this case it does not appear that the additional study changed the results, but the potential for one or more inadvertently omitted studies to do so in other settings is apparent. However, it is reassuring that a core subset of 7 studies were selected by 2 groups of analysts working independently.

Even though the 2 analyses shared many of the same studies, they did not read them in exactly the same way. For example, they report different sample sizes for one study. This might be explicable based on their handling of different allocation groups, but emphasizes the need for careful quality control in the data extraction phase of a meta-analysis. There are further differences in quality assessment of the component studies. Both groups used ostensibly the same quality instrument,5 but their descriptions of it sound as if 2 different methods were applied. Of the 7 studies they had in common, they disagreed on 4 of the quality scores, often by as much as 2 points on a 5-point scale. However, we may take some relief from a recent study comparing quality scoring schemes. It showed substantial disagreement between scoring instruments, disagreement so great that the results of a meta-analysis could swing from affirming an effect to denying one, based on the quality score used.6 In other words, the variation in the quality scores may not be important because quality scoring itself is suspect.

The problem with quality scoring is that there are many dimensions to quality in a scientific investigation, and they are relatively independent. Some of these dimensions (eg, random allocation, blind outcome assessment, intention-to-treat analysis) relate to bias in estimation, whereas others (eg, completeness of reporting, handling of ethical issues) may not. This problem of multidimensionality means that combining these dimensions into a single “quality score” will be problematic, for 2 reasons. First, the appropriate weighting of the different dimensions is unknown, and would differ depending on the purpose for which the scale is used. Second, some dimensions may be irrelevant for some purposes, so their inclusion in a quality system adds noise and obscures relevant information. The difficulties with quality scoring should not be misconstrued as evidence that quality does not matter. There is abundant evidence that poorer performance in certain aspects of clinical trials is associated with inflated (ie, biased) estimates of effect.7, 8, 9, 10, 11 The solution is to abandon attempts to reduce quality to a single number, and instead to incorporate relevant components of quality into the meta-analysis, either through meta-regression models, or through qualitative analysis.2, 12, 13

Although the procedural differences between the 2 meta-analyses are interesting, the ultimate question is, did they arrive at the same conclusion? It would be a bad day indeed for meta-analysis if 2 analyses using largely the same data came to diametrically opposed conclusions. A meta-analysis produces 2 fundamental results: a measure of the variability of effect among studies (termed heterogeneity) and a pooled estimate of effect magnitude. Heterogeneity is an important outcome, because if the results differ substantively among studies, there is some question about whether it is reasonable to combine their results at all.14 If heterogeneity is found, it is important that its source be investigated; in fact, this is sometimes more useful than the pooled effect estimate,15, 16 even though this investigation is often post hoc.

Here there is some cause for concern in that Rowe et al1 found considerable heterogeneity among their component studies, whereas Alter et al2 did not, even when both groups analyzed changes in pulmonary function tests as outcome measures. Rowe et al commendably spent considerable effort investigating sources of heterogeneity, and found much of the variability could be explained by differences in disease severity. When stratified on severity (in an a priori, planned subgroup analysis), they found a favorable response to magnesium in patients with severe asthma, and no response in patients with mild disease, and further that the response was not heterogeneous within those severity groups.

Statistical tests of heterogeneity are known to suffer from low power, particularly when the number of studies to be combined is small, so a negative test statistic, such as Alter et al2 obtained, should be viewed with some caution, particularly if it is close to the traditional “significance” cutoff.16, 17, 18 In most cases, it is reasonable to presume that clinical trials will be somewhat heterogeneous, so failing to reject a null hypothesis of homogeneity does not necessarily provide much assurance that no heterogeneity is present. An additional factor explaining the difference in heterogeneity might be that Alter et al used standardized effect measures, whereas Rowe et al1 performed the analysis in each study’s “natural units.” Standardized effect measures have long been used in the nonmedical meta-analytic world as a means of combining measurements of the same underlying phenomenon that are taken on different scales, such as peak expiratory flow rate (PEFR) and FEV1.19, 20 This likely reflects the social science origins of meta-analysis and social scientists’ interest in scale development. By using a standardized effect, Alter et al were able to use data from all 9 studies in calculating a pooled estimate of effect, whereas Rowe et al were limited to subsets that used common measures, thus losing some statistical power. However, standardized effect measures have been seriously criticized, at least in the medical realm, because the “standard unit” used varies across studies.21 In some situations, this can minimize heterogeneity, while in others, it can increase it. It can even reverse the order of effects.22 For this reason, it seems reasonable to give more weight to the analysis performed in natural units, at least with respect to heterogeneity, even while reserving judgment on the general usefulness of standardized effect measures.

Finally, it is important to note that despite their differences, the general tenor of the results of the 2 meta-analyses is similar. Both agree that magnesium can be beneficial, at least to severe asthmatic patients; both estimates of effect magnitude are reasonably close; and both agree that the effect is modest, not dramatic.

Just as in meta-analysis, exploring the differences between studies can be more enlightening than examining their similarities. We are fortunate to have had the opportunity to make such a comparison. In the future, authors should be encouraged to produce independent replications of work, and editors and journals encouraged to publish them. To do otherwise would be unscientific.

Back to Article Outline

References 

  1. Rowe BH, Bretzlaff JA, Bourdon C, et al.  Intravenous magnesium sulfate treatment for acute asthma in the emergency department: a systematic review of the literature. Ann Emerg Med. 2000;36:181–190
  2. Alter HJ, Koepsell TD, Hilty WM. Intravenous magnesium as an adjuvant in acute bronchospasm: a meta-analysis. Ann Emerg Med. 2000;36:191–197
  3. Greenland S. Can meta-analysis be salvaged?. Am J Epidemiol. 1994;140:783–787
  4. Moher D, Cook DJ, Eastwood S, et al.  Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. Quality of Reporting of Meta-analyses. Lancet. 1999;354:1896–1900
  5. Jadad AR, Moore RA, Carroll D, et al.  Assessing the quality of reports of randomized clinical trials: is blinding necessary?. Control Clin Trials. 1996;17:1–12
  6. Juni P, Witschi A, Bloch R, et al.  The hazards of scoring the quality of clinical trials for meta-analysis. JAMA. 1999;282:1054–1060 [see comments]
  7. Chalmers TC, Celano P, Sacks HS, et al.  Bias in treatment allocation in controlled clinical trials. N Engl J Med. 1983;309:1358–1361
  8. Schulz KF, Chalmers I, Hayes RJ, et al.  Empirical evidence of bias. JAMA. 1995;273:408–412
  9. Moher D, Pham B, Jones A, et al.  Does the quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses?. Lancet. 1998;352:609–613
  10. Lijmer JG, Mol BW, Heisterkamp S, et al.  Empirical evidence of design-related bias in studies of diagnostic tests. JAMA. 1999;282:1061–1066
  11. Coronary Drug Project Research Group . Influence of adherence to treatment and response of cholesterol on mortality in the Coronary Drug Project. N Engl J Med. 1980;303:1038–1041
  12. Greenland S. Quality scores are useless and potentially misleading. Am J Epidemiol. 1994;140:300–301
  13. Berlin JA, Rennie D. Measuring the quality of trials: the quality of quality scales. JAMA. 1999;282:1083–1085 [editorial; comment]
  14. Greenland S. Invited commentary: a critical look at some popular meta-analytic methods. Am J Epidemiol. 1994;140:290–296 [see comments]
  15. Thompson SG. Why sources of heterogeneity in meta-analysis should be investigated. BMJ. 1994;309:1351–1355
  16. Pocock S. Meta-analysis. Stat Methods Med Res. 1993;2:117–119 [editorial]
  17. Boissel JP, Blanchard J, Panak E, et al.  Considerations for the meta-analysis of randomized clinical trials. Summary of a panel discussion. Control Clin Trials. 1989;10:254–281
  18. L’Abbé KA, Detsky AS, O’Rourke K. Meta-analysis in clinical research. Ann Intern Med. 1987;107:224–233
  19. Hedges LV, Olkin I. Statistical Methods for Meta-Analysis. New York, NY: Academic Press; 1985;
  20. In:  Cooper H,  Hedges LV editor. Handbook of Research Synthesis. New York, NY: Russell Sage Foundation; 1994;
  21. Greenland S. Quantitative methods in the review of epidemiologic literature. Epidemiol Rev. 1987;9:1–30
  22. Greenland S, Schlesselman JJ, Criqui MH. The fallacy of employing standardized regression coefficients and correlations as measures of effect. Am J Epidemiol. 1986;123:203–208

 Address for reprints: Robert L. Wears, MD, MS, Department of Emergency Medicine, University of Florida Health Science Center, 655 West 8th Street, Jacksonville, FL 32209; 904-549-4124, 904-549-4508; E-mail wears@ufl.edu .

PII: S0196-0644(00)26585-2

doi:10.1067/mem.2000.109692

Refers to article:

  • Intravenous Magnesium Sulfate Treatment for Acute Asthma in the Emergency Department: A Systematic Review of the Literature

    Brian H. Rowe, Jennifer A. Bretzlaff, Chris Bourdon, Gary W. Bota, Carlos A. Camargo
    Annals of Emergency Medicine September 2000 (Vol. 36, Issue 3, Pages 181-190)

  • Intravenous Magnesium as an Adjuvant in Acute Bronchospasm: A Meta-Analysis

    Harrison J. Alter, Thomas D. Koepsell, William M. Hilty
    Annals of Emergency Medicine September 2000 (Vol. 36, Issue 3, Pages 191-197)

Annals of Emergency Medicine
Volume 36, Issue 3 , Pages 233-236, September 2000