Annals of Emergency Medicine
Volume 52, Issue 4 , Pages 458-472, October 2008

Acutely Decompensated Heart Failure in a County Emergency Department: A Double-Blind Randomized Controlled Comparison of Nesiritide Versus Placebo Treatment:

Answers to May 2008 Journal Club Questions

  • Tyler W. Barrett, MD

      Affiliations

    • Vanderbilt University Medical Center, Nashville, TN
  • ,
  • David L. Schriger, MD, MPH

      Affiliations

    • University of California, Los Angeles, Los Angeles, CA

Article Outline

 

Back to Article Outline

Discussion points 


1.In 2002, the Journal of the American Medical Association (JAMA) published an article from the Vasodilatation in the Management of Acute CHF (VMAC) Investigators that reported that nesiritide improves hemodynamic function and self-reported global clinical status more effectively than intravenous nitroglycerin or placebo.1 This article was chosen by the American Board of Emergency Medicine's Lifelong Learning and Self Assessment (LLSA) for the 2005 Reading List. However, a meta-analysis of 3 trials published 3 years later in JAMA concluded that nesiritide may be associated with an increased risk of death after treatment for acutely decompensated congestive heart failure.2 This article was chosen for the 2007 LLSA Reading List.
1.A. Conduct a brief review of medical literature, tracing the history of nesiritide from its Food and Drug Administration (FDA) approval in 2001 to the current knowledge of risk and benefits of this drug. According to these published trials, what is your opinion on whether nesiritide should be included in the emergency department (ED) treatment of acutely decompensated congestive heart failure?

B. According to your review for question 1A, now consider what information was known about the drug when Miller et al3 began their study in February 2004 and when they submitted the article for publication in July 2007. How might the studies critical of nesiritide and published between 2004 and 2007 have affected this article's discussion points?

C. Hauptman et al4 published an article in 2006, showing that use of nesiritide decreased from 17% to 6% of patients admitted with heart failure between March and December 2005 after publication of 2 articles citing increased mortality and worsening renal function in March and April 2005. What factors might have contributed to this significant decrease in use? Has the use of nesiritide rebounded in the United States, or were these studies the “nail in the coffin” for the drug? What resources might a physician use to research the prescribing rates of medications in the United States?

2.A. Define “confounding.” Be sure to develop both a technical definition and a general understanding of what the term means. Draw a basic causal diagram that includes a treatment, an exposure, and a confounder (see Greenland et al5 and Glymour and Greenland6 for some help with this). Define Hume's counterfactual definition of causation and explain how it relates to medical research.

B. Using the causal diagram you drew in question 2a, explain how a randomized trial, in theory, can eliminate confounding. What conditions are necessary to maximize the likelihood that randomization achieves this goal? Name some of the events that can happen during a clinical randomized trial that might undo the effects of randomization and produce a confounded result.

C. How do we know whether a randomized trial is confounded? Researchers often present a table that compares characteristics of the subjects in different arms of the study. Is a statistical comparison of these characteristics helpful? Why or why not? What are some pitfalls in trying to determine whether a study is confounded? What are some solutions? Discuss Table 1 from the Miller et al3 article in light of the above. Do you think the study could be confounded? How?

3.Intravenous nitroglycerin is frequently used in the ED treatment of patients with acutely decompensated heart failure. The Heart Failure Society of America 2006 Comprehensive Heart Failure Practice guideline states that “in the absence of symptomatic hypotension, intravenous nitroglycerin, nitroprusside, or nesiritide may be considered as an addition to diuretic therapy for rapid improvement of congestive symptoms in patients admitted with acutely decompensated congestive heart failure.”7

A. Miller et al excluded patients who had new-onset congestive heart failure and those actively receiving nitroglycerin in the ED. Why might the authors have chosen to exclude these patients?

B. Many of the original clinical trials compared nesiritide to placebo and not intravenous nitroglycerin. Why might the manufacturer of a new study drug choose to compare their treatment to placebo instead of a commonly used therapy such as nitroglycerin? Does the FDA require pharmaceutical companies to test new treatments against currently accepted treatments or is a trial showing a benefit over placebo sufficient?

C. The VMAC trial compared nesiritide to placebo and nitroglycerin and reported improved hemodynamic function. Critics of this study contend that the dose of nitroglycerin used was lower than standard doses used in the clinical treatment of acutely decompensated congestive heart failure. According to your review of the medical literature, write a brief report for your hospital's pharmacy and therapeutics committee, comparing the benefits and drawbacks of nesiritide and intravenous nitroglycerin in the treatment of patients presenting to the ED with acutely decompensated congestive heart failure.

4.A. What is the trial registration? Why is it important? This trial was registered. Find the registration page online. Is the trial registration information adequate? When was the trial registered? Are there any potential problems? What else would you like to see?

B. What is the Consolidated Standards of Reporting Trials (CONSORT) statement? Discuss the importance of items 6 through 11 in the CONSORT checklist (study outcomes, sample size, randomization generation, allocation concealment, randomization implementation, blinding, or masking). What are the pros and cons of reporting guidelines such as CONSORT, Quality of Reporting of Meta-Analysis (QUOROM), and Standards for Reporting of Diagnostic Accuracy (STAR-D)?

C. Discuss to what extent this study successfully addressed items 6 to 11 in CONSORT and how any shortcomings might bias this study.

5.A. In your opinion, what are the most important conclusions from this article? How might the limitations mentioned by the authors affect your decision whether to change your clinical practice with regard to the treatment of patients with acutely decompensated congestive heart failure in your ED?

B. Nesiritide is a recombinant form of the natural human peptide, hBNP. Studies have demonstrated that patients with acutely decompensated congestive heart failure often already have increased BNP levels in the blood.8, 9 If these patients already have increased levels of BNP circulating in their blood, how does nesiritide improve the treatment of acutely decompensated congestive heart failure?

C. What additional information or data analyses would you like the authors to provide for you to change your clinical practice?


Back to Article Outline

Answer 1 

Q1.a In April 1998, a new drug application was filed with the FDA for nesiritide. Mills et al10 published an industry-supported blinded trial comparing 3 different doses of nesiritide to placebo and found statistically significant reductions in systematic vascular resistance and pulmonary-capillary wedge pressure (PCWP). In July 2000, Colucci et al11 published an industry-supported clinical trial in the New England Journal of Medicine that included both an efficacy trial and a comparative trial as part of the FDA approval process for nesiritide. The efficacy trial randomized 127 patients to one of 3 treatment arms: nesiritide at 2 different doses or placebo. The predefined primary endpoint was the change from baseline in the PCWP 6 hours after initiation of a nesiritide infusion. Secondary endpoints included assessments of global clinical status, dyspnea and fatigue measurements at baseline and at 6 hours after treatment, and additional hemodynamic measurements. The authors reported that PCWP decreased by 6.0 (SD 7.2) and 9.6 (6.2) mm Hg at 6 hours compared with a decrease of 2.0 (7.2) mm Hg in the placebo limb. Additionally, Colucci et al11 reported that patients receiving the 2 formulations of nesiritide reported an improvement in their 6-hour global clinical status (60% and 67% versus 14%) and a reduction in their dyspnea (57% and 53% versus 12%) and fatigue (32% and 38% versus 5%) compared with placebo.

In the comparative second portion of the Colucci et al11 trial, 305 patients were randomly assigned to one of the 2 doses of nesiritide or open-label standard therapy (102 patients) that included dobutamine (57%), milrinone (19%), nitroglycerin (18%), dopamine (6%) and amrinone (1%). Global clinical status and dyspnea and fatigue were measured in each patient at baseline, 6 hours, and 24 hours and at the end of their treatment. Patients in all 3 arms of the study reported improvement after treatment, and there were no statistically significant differences among the 3 groups. The authors concluded that “in patients hospitalized with decompensated congestive heart failure, nesiritide improves hemodynamic function and clinical status.”11 The authors mentioned that dose-related hypotension was the most common adverse event and that this was usually asymptomatic or mild. This article reported that the rate of death from all causes up to 3 weeks after enrollment in the study was 6% and was similar among the 3 treatment groups. In August 2001, nesiritide was approved by the FDA for the short-term treatment of acute, decompensated, congestive heart failure.

In March 2002, JAMA published the results of the industry-supported VMAC trial.1 VMAC was a randomized, double-blind, clinical trial of 489 hospitalized patients with decompensated congestive heart failure who were randomized to intravenous nesiritide, intravenous nitroglycerin, or placebo added to standard medications. The study measured changes in PCWP among patients who had a pulmonary artery catheter and dyspnea self-assessments at 3 hours after treatment. Additional secondary measures included patient global clinical status assessment and safety evaluation of the medications. The study reported that the patients treated with nesiritide had a significant decrease in PCWP when compared with patients treated with nitroglycerin and placebo. Patients treated with nesiritide also reported improvements in dyspnea compared with placebo. However, there were no statistically significant differences in dyspnea or global clinical status improvements between nesiritide and nitroglycerin. Critics of the VMAC trial contend that the nitroglycerin was not titrated aggressively.12, 13 In patients with acutely decompensated heart failure and acute pulmonary edema, emergency medicine teaching is to increase the intravenous nitroglycerin infusion by 5 to 10 μg/minute every 3 to 5 minutes as tolerated by the patient's hemodynamics up to a maximum dose of 200 to 300 μg/minute.14 The mean dosages of nitroglycerin given in the VMAC study in noncatheterized and catheterized patients were 29 μg/minute and 42 μg/minute at the 3-hour point, respectively. The median dose was 13 μg/minute in both groups at the 3-hour point. The nitroglycerin dose continued to be titrated up in the catheterized patients between the 3-hour point and the 24-hour measurement to a mean dose of 56 μg/minute and a median dose of 20 μg/minute.1

Later in 2002, the industry-supported Prospective Randomized Evaluation of Cardiac Ectopy with Dobutamine or Natrecor Therapy (PRECEDENT) study prospectively evaluated the differential effects of 2 fixed doses of nesiritide (0.015 or 0.030 μg/kg/minute, with no preceding bolus) and dobutamine (≥5 μg/kg/minute) on pulse rate and episodes of ventricular ectopy during the treatment of acutely decompensated heart failure in hospitalized patients.15 The authors reported that dobutamine was associated with greater rates of ventricular ectopy and that nesiritide did not increase ventricular ectopy or pulse rate. In September 2004, Yancy et al16 published the results of the industry-supported Follow-Up Serial Infusions of Nesiritide pilot trial (FUSION I) trial that randomized 210 patients to standard treatment or weekly outpatient nesiritide infusions in addition to standard treatment. Mortality rates and the frequency of adverse events were similar between the groups.

In March 2005, Sackner-Bernstein et al17 published an article in Circulation reporting that their meta-analysis of clinical trials involving nesiritide demonstrated that nesiritide significantly increases the risk of worsening renal function in patients treated for acutely decompensated heart failure. This article was followed in April 2005 by a companion article by Sackner-Bernstein et al2 that reported a pooled analysis of the randomized controlled trials evaluating nesiritide. Of the 12 trials that evaluated nesiritide, 3 met their inclusion criteria: randomized, double-blind study of patients with acutely decompensated heart failure, single infusion therapy, ionotrope not mandated as control, and reported 30-day mortality.2 In the 3 trials, a total of 485 patients were randomized to nesiritide and 377 to control treatment. Thirty-day mortality was higher in patients randomized to nesiritide (35/485; 7.2%) compared with those treated with control (15/377; 4.0%). The authors concluded that nesiritide treatment may be associated with an increased risk of death in patients treated for acutely decompensated heart failure.

In July 2005, Topol12 wrote a commentary in the New England Journal of Medicine, in which he reviewed the aforementioned studies and stated that “nesiritide has not yet met the minimal criteria for safety and efficacy.” Topol also questioned why a medication “that is associated with higher rates of both renal dysfunction and death than placebo—and that costs 50 times as much as standard therapies” was administered to more than 600,000 patients.12

In October 2005, Peacock et al18 reported the results of the industry-supported Prospective Randomized Outcomes Study of Acutely decompensated CHF Treated Initially as Outpatients with Nesiritide (PROACTION) trial that was a multicenter, double-blinded, placebo-controlled pilot study that evaluated the efficacy and safety of standard treatment with nesiritide or placebo in the treatment of 237 ED/observation unit patients with acutely decompensated heart failure.18 This was the only ED-based study of nesiritide and evaluated the following measures: need for initial admission, length of hospital stay, and 30 day inpatient rehospitalization. The authors concluded that nesiritide was safe to use in the ED and found lower rates of all 3 outcome measures in the nesiritide group compared with the control group.18 The lead author did submit a correction to the original data after a joint FDA and Scios (the drug's manufacturer) review of 180-day mortality, which showed the 30-day unadjusted mortality in the nesiritide group to be 7 of 120 (5.9%) versus 1 of 117 (0.9%) in the placebo group.19 The authors report that this was not a statistically significant difference according to a P value of .066, and we confirm this calculation as follows:

However, given the small number of events, an alternative way to look at this problem is to say that there was 5% (95% confidence interval [CI] 0.5% to 9%) higher mortality in the nesiritide group, ie, their data are compatible with either essentially no increased mortality, up to a 9% increase in mortality. Given previous reports of an association of nesiritide use with increased mortality, we must be concerned that the true value could lie in the higher end of this range.

In 2008, emergency physicians must decide whether nesiritide provides an important benefit in the acute treatment of patients with acutely decompensated heart failure compared to the other treatment options. If nesiritide does provide a treatment benefit over other standard therapies, is this benefit great enough to outweigh the previously reported potential association with worsening renal function and increased 30-day mortality? Given the lengthy, successful experience with intravenous nitroglycerin for congestive heart failure, we find the literature insufficient to support a change to nesiritide.

Q.1.b According to the literature review discussed in the first part of this question, at the inception of this study Miller and his fellow investigators were aware of the results of the VMAC trial and possibly the PRECEDENT trial because all of these trials, including the Miller et al study, were sponsored by the manufacturer of nesiritide.3 Therefore, at study design, these authors were aware of the potential hemodynamic and short-term clinical symptom improvement in patients given nesiritide. The investigators were also likely aware of criticism about the dosing of nitroglycerin in the VMAC study.12, 13 However, the authors chose not to use nitroglycerin in their standard treatment of acutely decompensated heart failure and excluded patients who required nitroglycerin infusions.

In February 2004, when the study began, the authors may not have been aware of the association with worsening renal dysfunction and increased 30-day mortality. But by the end of the study in June 2005, a number of critical events would have taken place. In January 2005, Scios met with the FDA to review mortality data on nesiritide.19 In March and April 2005, the 2 articles by Sackner-Bernstein et al2, 17 were published and the critical commentary in New England Journal of Medicine appeared shortly after the study completed.12 Given these reported concerns for worsening renal function and increased mortality, Miller et al addressed these issues in their “Discussion” while acknowledging that their study was not powered to detect either a difference in renal impairment or increase in mortality.3

Q1.c Hauptman et al4 suggest that a number of factors contributed to the rapid decrease in use of nesiritide. These factors included the 2 publications that suggested increased mortality and worsening renal injury with nesiritide, release of a panel summary led by Eugene Braunwald that recommended nesiritide use in only the most acutely ill inpatients, and publication of a critical commentary in the New England Journal of Medicine. The authors suggest that physician practice might react more rapidly to negative postapproval data than positive efficacy results.4 When potential serious adverse effects are associated with a new medication, physicians' decision to avoid using this medication might include concern for patient safety, concern for additional unreported adverse effects, and potential medicolegal liability. It is not uncommon to see a television commercial asking patients potentially harmed by a recently recalled medication to join a class action suit. Because the clinical trials that provide the evidence used to gain FDA approval for a new drug are grossly underpowered to detect rare but important adverse effects, post-FDA approval studies that evaluate treatment safety and efficacy are a crucial aspect of medical research and overall patient safety. Our attempt to attain current nesiritide use data from the (877) 4NATREC (OR) information line was not answered. However, the New York Times reported in February 2006 that Scios, nesiritide's original manufacturer, planned to lay off 150 of their 900 employees and that the company was struggling because of nesiritide's decreasing sales.20 The aforementioned studies and critical editorials decreased nesiritide use, and in the transcript of a Johnson & Johnson fourth-quarter 2007 earnings conference call, the manufacturer of nesiritide reports a one-time charge of “$441 million for the write-down of the intangible asset related to NATRECOR.”21 What this accounting jargon means is that Johnson & Johnson is not going to receive revenue or profits from sales of Natrecor that were anticipated when they purchased Scios in 2003 for $2.4 billion. The Johnson & Johnson executives have determined that Natrecor is not going to be as profitable as once expected, which ultimately led to this action. This write-down would be expected to adversely affect their earnings forecast and possibly lead to a lower stock price for Johnson & Johnson.

There are a number of sources of information for clinicians who wish to find out how often a drug is being prescribed. Although some databases are proprietary and costly to obtain, prescription rates of certain medications can be found in Premier's Perspective Comparative Database which was used by Hauptman et al4 in their analysis. Additional resources include Medical Expenditure Panel Survey, Medicare Current Beneficiary Survey, National Health and Nutrition Examination Surveys, United Kingdom General Practice Research Dataset, UnitedHealth Group, and Kaiser Permanente Medical Care Program.22

Back to Article Outline

Answer 2 

Q2.a Most medical studies seek to determine whether an exposure or treatment causes a change in outcome. Unfortunately, we cannot measure cause directly; we can only measure the association between variables. Consider a study that asked a random sample of persons aged 55 to 75 years whether they ever had lung cancer and whether, before the lung cancer, they drank more than 10 units of alcohol each week. Investigators might report that of 10,000 persons surveyed:

The probability of having lung cancer if you use a lot of alcohol is 160/3000, or .053, and is 240/7000, or .034, if you do not. This produces a risk ratio of .053/.034=1.6 (95% CI 1.3 to 1.9), with a P<.0001. Does this mean that alcohol causes lung cancer? It would if the causal diagram for the relationship between alcohol intake and lung cancer looked like this:

Causal diagrams5, 6, 23 are used to show the relationship between exposures (which could be treatments), outcomes, confounders, effect modifiers, and intermediaries. In this first causal diagram, the direction of the single-headed arrow implies that alcohol causes lung cancer but not vice versa. The absence of other arrows implies that nothing else causes lung cancer and nothing confounds the relationship between alcohol and lung cancer. Unfortunately for scientists, little in clinical medicine looks like this. At best the situation is:

A confounder is a condition that is associated with both the exposure and the outcome and causes the observed association between the exposure and the outcome to represent something other than the pure causal relationship between these variables. In the first figure, the observed association between alcohol and lung cancer is due solely to the effect of alcohol on smoking. In the second figure, the observed association between alcohol and lung cancer has 2 components: (1) the direct (causal) effect of alcohol on lung cancer, and (2) the indirect effect caused by alcohol's association with the confounder and the confounder's causal association with lung cancer. As a result of the existence of a “backdoor pathway” between alcohol and lung cancer through the confounder, the association of alcohol with lung cancer is a confounded (or biased) estimate of the effect of alcohol on lung cancer.

We would expect that by now many of you have replaced the generic “confounder” with the specific risk factor “smoking” because those who smoke are likely to drink and those who smoke are likely to get lung cancer. In theory, any observed association between alcohol and lung cancer could be due solely to a direct effect, solely to the confounding effect of smoking, or some combination of the 2. Furthermore, the absence of an observed association between a treatment and an outcome does not mean that there is no causal treatment effect. It is possible that a confounder's effect is in the opposite direction of the treatment effect and the 2 effects cancel, thereby removing an association that would have been observed had there not been confounding.

For those more comfortable with tables, we review how smoking confounds the relationship of alcohol and lung cancer by stratifying the above 2×2 table into separate tables for smokers and nonsmokers. We get:

Note that there is no effect of alcohol on lung cancer in either table. For the smokers, the risk of lung cancer is 0.12 regardless of alcohol stratum. For non-smokers the risk is 0.02 regardless of alcohol stratum. This exaggerated example (if only the numbers were always this clean) shows how a confounder (smoking) can create an observed association between 2 variables (in this case a risk ratio of 1.6 between alcohol use and lung cancer) when there actually is no association. The ultrasmall P value for the aggregate (confounded) table simply tells us that results that extreme are likely not caused by chance but says nothing about whether there is bias. An investigator who, according to the crude, nonstratified table reported the risk ratio of 1.6 as an unbiased estimate of the effect of alcohol on lung cancer would be doing patients and the scientific community a disservice.

How can we prove that A causes B? How can we be sure that there is not an unknown or unspecified confounder that is biasing the relationship we desire to measure? The answer is that we cannot be sure. Hume, that great kilted philosopher of the Scottish Enlightenment, shed some light on this subject with his counterfactual definition of causation. Hume argued: A causes B if and only if we always get B when we have condition A and we never get B when we do not have condition A. In logical terms, if A, then B; if not A, then not B. This makes sense because if A truly is the cause of B then we always should be able to produce B through A and, in the absence of A we should not observe B. The problem with this definition is that either A or not A is contrary to fact and therefore “counterfactual.” One can either give a patient nesiritide or not, but one cannot do both. We can observe the effect of treatment A or we can observe the results of treatment not A, but we cannot do both. Experimental science consists of developing methods that come as close as possible to observing A and not A simultaneously. For example, we can conceive of 2 flasks of Escherichia coli raised under such similar conditions that the 2 flasks could be considered identical such that if both flasks received treatment A they would undergo identical change and if both flasks received not A they would also change identically. If these conditions were true, then any difference between flasks when one is given A and the other not A would capture the effect of A on the flask. If all the bacteria in the A flask died and all those in the not A flask lived, we could argue that the exposure to A caused the death. But a skeptic could always argue that a disgruntled laboratory technician spit in one flask while conducting the experiment, or that there was residual soap in one of the flasks, or that one flask was in a sunny window whereas the other was a few inches over in the shadow. The truth is that we can never prove causation; we can only create situations in which alternative explanations are so far fetched that we are satisfied that A causing B is the most likely explanation for our observations.

If we accept this formulation of causation, we can define a confounder as any factor that would make the exposed (treatment) group's outcome different from unexposed (control) group's outcome, given that both groups were exposed to the same conditions.24 In our example above, the persons in the high-alcohol-use group would have more lung cancer than those in the low-use group even if the 2 groups had been forced to have identical alcohol intake because there would still be more smoking in the high-alcohol-use group. We can therefore say that smoking confounds the observed relationship of smoking and alcohol.

Q2.b In question 2a, we learned that a necessary condition for confounding is an association of the confounder with both the treatment and the outcome—a backdoor pathway that renders the observed association between treatment and outcome subject to influences besides the treatment effect. If we can remove the arrow (the association) between the confounder(s) and the treatment, we can eliminate the possibility of confounding. Randomization changes Causal Diagram 2 into Causal Diagram 3:

Ethics aside, if we could force people to drink or not drink, with each person's drinking status determined randomly, there should be no association between the confounders and alcohol use because alcohol use is randomly assigned. By erasing the arrow between the confounder and alcohol use, we interrupt the backdoor pathway, leaving only the “treatment” effect between alcohol use and lung cancer. The observed association between these variables should be a direct measure of the treatment effect. Thus, in theory, randomization can be used to achieve a circumstance similar to the 2 flasks of E coli, in which the 2 groups are identical in all respects except for treatment status. This is the closest we can come to conditions that meet Hume's counterfactual definition of causation.

Unfortunately, although randomization elegantly erased the arrow between confounders and treatments in Causal Diagram 2 above, reality is a bit more complex. First and foremost, randomization is not always possible. It is ethically untenable and logistically impossible to force individuals to drink a certain amount of alcohol each week for say 20 years. It is also ethically unacceptable to randomize patients to placebo when effective treatments are believed to exist, even if the evidence that these treatments work is shaky. It is therefore hard to conduct trials on established therapies because we cannot generate an appropriate control group. It is also logistically difficult to perform randomized trials when there is a long delay between exposure and outcome. We could randomize stable trauma patients to total body computed tomography (pan-CT) or observation, but it would be difficult to follow these patients long enough (let alone control subsequent exposure to radiation and carcinogens) to determine whether one group was at higher risk for cancer as a result of radiation from the pan-CT.

Even when randomization is ethically and logistically possible, there are potential problems. First, randomization works only when numbers are large. Although clinicians tend to think of a 500-person study as “large,” statisticians have numbers such as 10,000 in mind. Randomize the emergency physicians at your hospital into 2 groups and you can be certain that despite random assignment, there will be lots of differences between the groups. On average, one group might be taller, older, wiser, or better dressed, but you can be certain that when only 20 or 30 subjects are randomized there will be some characteristic that differs between groups. Such differences reestablish the arrow between “confounder” and “exposure” that randomization was supposed to remove and therefore reintroduce the possibility of confounding. If instead of randomizing your emergency physicians, you randomized every individual in New York City (8 million people), you can be fairly certain that the 2 groups will be well balanced in all characteristics, including characteristics one can measure (eg, height) and those one cannot (eg, karma).

To get a feel for what kinds of numbers are needed to produce essentially equivalent groups, we perform a simulation. We start with a population of 100,000 subjects, each of whom has a characteristic X that is binary (ie, every individual is either “yes” or “no” for characteristic X); 50% of the population is “yes” for X and 50% “no” ((P(yes)=P(no)=.5) We randomly select 2 groups, each with 25 persons, from this population and calculate how the 2 groups differ with respect to the percentage of subjects that are “yes.” We repeat this process 500 times and graph the difference between groups for each of the runs. We get the top panel in the following graph, which shows that the percentage of subjects who are “yes” for variable X will differ by 10% or more in about 48% of the runs. If there were 4 important variables and we randomized 25 subjects to each limb, all 4 variables would be within 10% in the 2 groups only 0.524=7% of the time!

The bottom panel of the graph shows what happens to the difference in the percentage of subjects who are “yes” for characteristic X between the 2 limbs as the sample size increases. Below 500 subjects, we see that the 2 groups will frequently differ by 10% or more. At 500 subjects per limb, the 2 limbs will differ by 10% or more rarely but will differ by 5% or more about 13% of the time. At 5,000 subjects per limb, differences of greater than 5% will almost never be expected, but differences greater than 2% will be expected about 5% of the time and differences greater than 1%, about 30% of the time. This exercise demonstrates that randomization, as conducted in the small (<250 subjects per group) randomized trials that predominate in clinical medicine, in no way guarantees that groups are equivalent with respect to all potential confounders. Therefore, it cannot be assumed that randomization has rendered the study immune to bias.

Even in very large randomized trials in which the number of subjects is sufficient to ensure that randomization has done its job and created equal groups, we are not guaranteed that the study is not confounded. Such factors as noncompliance, loss to follow-up, and measurement error can reintroduce confounding and bias.

Imagine that patients with an abdominal aortic aneurysm greater than 6 cm were randomized to surgery or medical management to determine which treatment reduces mortality. In this fantastic trial, there are 10,000 subjects in each group and the randomization is executed perfectly. This surely will settle whether medical or surgical therapy is better, right? Perhaps not. Consider the following potential problems:

1. Patients can change their mind. What if some patients randomized to medical treatment opted to have surgery? We then have 2 entities, the treatment group defined by the randomization process and the treatment group defined by what actually occurred. Causal diagram 4 depicts this.

We now have a problem. The causal pathway between the groups defined by randomization and the outcome (the dark dashed line) remains free of confounding (there is no backdoor pathway) but is not what we want to know about. For example, if everyone in the medication group opted for surgery, then the surgery group and the medication group should have identical outcomes. There would be no association between treatment and outcome, and we might erroneously conclude that there is no difference between surgical and medical management. This is the problem with the oft-performed intention-to-treat analysis. This analysis is unlikely to be confounded but may fail to answer the question we set out to ask. The reason intention-to-treat analysis is widely advocated is that the alternative is even worse.

If we analyze the relationship between actual management and survival, we must recognize that there is the direct causal pathway (the dark dotted arrow) we want and a backdoor pathway (through the lighter dashed arrow). How did the confounder (aneurysm size) reenter the picture? It was not present at first because randomization determined who had surgery, not aneurysm size. But once patients in the medical arm start to opt for surgery, one can guess that those with larger aneurysms are more likely to jump ship than those with smaller ones. This reestablishes a relationship between aneurysm size and treatment method and thereby creates confounding through the backdoor pathway: actual management→aneurysm size→survival. The crude association between actual treatment and survival will not be an unbiased measure of the effect of surgery on survival because sicker patients in the medical group may have opted for surgery and these patients are more likely to have a poor outcome.

2. There are many other factors that could reintroduce confounding into our once pristine randomized trial. If surgeons with better interpersonal skills also had better technical skills, then those patients randomized to surgery who had nasty doctors might be more likely to cancel surgery. Those patients who actually have surgery would, on average, have had surgeons with better technical skill. This introduces bias by overestimating the expected outcome of the surgical limb.

Although measurement error is unlikely to effect a trial whose outcome is survival, it is possible that loss to follow-up could cause confounding if the probability of loss to follow-up was associated with the treatment and with the probability of survival. There are many other ways that randomized trials can become confounded. By reading the references, readers who desire can become skilled at drawing causal diagrams and analyzing potential confounding by reading the references. The important point is that although a randomized trial is often the closest we can come to establishing causation, it is by no means a guarantee of an unconfounded result, particularly when the number of subjects is not large.

Q2.c Table 1 of most clinical randomized trials is a table that shows the demographic and clinical characteristics of subjects in each group. The purpose of this table is to provide some indication of whether the groups are comparable. By “comparable” we mean, are the groups sufficiently similar that if both groups were given the same treatment, we would expect the same outcomes? We recognize this as the equivalent of saying, is the trial confounded? Again, confounding occurs when 2 groups would have had different outcomes despite being given the same treatment.

How should we use the information in a Table 1? Rule number 1 is do not perform statistical tests.25, 26 Some wide-circulation journals such as one published in a town that once had a tea party continue to do this, but others, such as this journal and Annals of Internal Medicine, have recognized the inappropriateness of this approach. It is inappropriate for 2 reasons. First, confounding is inherent in the clinical problem and has nothing to do with statistical significance or randomness. Either there is a backdoor pathway or there is not. The numbers do not matter. Because confounding can exist in the absence of statistically significant differences (2 patients in the 10-patient placebo group and no patients in the 10-patient treatment group were near dead on arrival; P=.13) and statistically significant differences can occur in the absence of confounding (there were more Virgos in the treatment group than the placebo group; P<.00001), there is no point looking at P values because they tell us nothing about confounding. Second, even if one erroneously believed that frequentist statistics can inform decisions about confounding, it must be acknowledged that studies are not powered to detect differences between baseline characteristics; they are powered to detect changes in outcome. Typically, the desired change in outcome is much larger than the kinds of differences that might raise concerns about confounding, and therefore studies are typically underpowered to detect differences in baseline characteristics.

Second, recognize that confounding can occur despite the mean value of a potential confounder being the same in each group. Consider a randomized trial of 2 methods of trauma care with the outcome of death. Table 1 of the article reports that the mean systolic blood pressure for the 2 groups was 90 (SD 6) and 89 (SD 6). Can we be assured that systolic blood pressure is not a confounder in this study? If the distributions look like these, we might not be too concerned.

But what if the distributions had means and SDs of 90 (8) and 89 (6) and looked like this:

The means and SDs are similar (P=.13), but the 3 extremely hypotensive patients in one group might have a much lower probability of survival than any patient in the other group. These 3 patients could easily confound the study. From this example, we hope you learn that simply seeing the central tendency of a distribution is not enough to make an informed decision about the possibility of confounding. In many circumstances, confounding occurs at the extremes of a distribution, and without seeing the actual distribution it is difficult to know whether confounding is an issue.

Table 1 of the Miller et al article presents a variety of information about the participants in this study. Again, what we need to know is, if the 2 groups had been given the same treatment, would they have had the same outcome? Although this is technically unknowable, we might be satisfied with, “are these 2 groups equally ill?” There is no variable presented in Table 1 that truly answers this question. Although there is no difference that is glaringly huge, the 10% difference in the number of patients who had been hospitalized in the past 2 months, the 8% difference in the number of patients with class IV heart failure, and the 6 extra patients in the nesiritide group with end-stage renal disease all could confound the study. We are not shown distributions of continuous variables, so it is hard to know whether the possibility of confounding is great. Finally, an entry at the bottom of the table must have snuck past this journal's statisticians: “All demographic and laboratory data in Table 1 with the exception of Non-Insulin Treated Diabetes and Sodium did not demonstrate statistically significant differences.” We hope that readers now recognize the meaninglessness of this statement; it has little bearing on the whether the study is confounded.

In summary, it is hard to know whether this small study is confounded. Given the small sample size (see the exercise above), we can be certain that some clinical variables are unequally distributed between the groups. The question is, are those variables, be they measured or unmeasured, confounders? Table 1 does not provide a tremendous amount of help. We would need to pore over the actual data, using stratification to determine whether some of the seemingly sicker patients: (a) were disproportionately assigned to one group and (b) had worse outcomes. Such stratified analyses, however, will quickly encounter small sample sizes. Ultimately, there is no logical method or mechanical technique that can determine the extent to which this study is confounded. Such a determination is necessarily subjective. Sensitivity analyses can help and will be considered in a future journal club.

Back to Article Outline

Answer 3 

Q3.a Miller et al wanted to measure the effect of nesiritide in ED patients with acutely decompensated congestive heart failure.3 Ideally, an investigator seeking to answer this question would enroll all such patients. However, patients often have multiple disease processes, and it might not be initially clear whether a complaint of dyspnea is a result of congestive heart failure or another disease process. Therefore, these authors chose to sacrifice some breadth (enrolling all comers) to have a clearly defined patient population. By having inclusion criteria that required known history of congestive heart failure, dyspnea at rest or with minimal exertion and with a respiratory rate greater than 24 breaths/min, evidence of volume overload according to physical examination findings or chest radiograph, and a brain natriuretic peptide (BNP) level greater than 100,20 the authors ensured that enrollees were likely to have congestive heart failure.

By excluding patients with new-onset congestive heart failure, they limit the generalizability of their findings. Patients with new-onset congestive heart failure might have different treatment response rates than patients with established congestive heart failure. In an urban setting in which primary care follow-up might be more of a problem, patients with new-onset congestive heart failure might return more frequently to the ED and therefore require readmission because of unfamiliarity with their medications and lifestyle modifications required with congestive heart failure.

The authors also excluded patients actively using nitroglycerin in the ED. It is not clear whether this included patients given sublingual nitroglycerin by emergency medical services before arrival or patients who received nitroglycerin sublingually or transdermally in the ED. The exclusion of these patients might have resulted in a selection bias by excluding patients with more severe pulmonary edema who required nitroglycerin in addition to loop diuretics. This group of patients might have different response rates to the nesiritide and required more frequent ED visits and admissions than the enrolled patients in the study. Intravenous nesiritide could replace intravenous nitroglycerin or could augment it, but this study explores neither of these possible uses.

The authors used both subjective (patient self-reporting of symptoms) and objective (BNP level >100 pg/ml, chest radiograph findings) to define an acute decompensation. They used a new questionnaire to collect this information, and the limitations of using a nonvalidated instrument will be a covered in a future Journal Club. In the accompanying editorial, Diercks27 points out that a potential criticism of these inclusion criteria is that the reader is unable to know the magnitude of decompensation from the patient's baseline. Is a BNP of 150 pg/ml markedly abnormal for this patient or is that value near the patient's baseline? It is essential for the reader to closely examine both the inclusion and exclusion criteria and determine whether the patient population enrolled was appropriate for the study hypothesis.

As discussed above, one cannot generalize the results of this study to patients with congestive heart failure who are treated with nitroglycerin because these patients were excluded from this study. The authors do mention that the use of intravenous nitroglycerin was not a part of their hospital's standard treatment of congestive heart failure. Few of the previous trials have compared nesiritide and nitroglycerin for the treatment of congestive heart failure. One trial that did compare the 2 medications was the VMAC trial.1 In that trial, the authors compared nesiritide with nitroglycerin and found a greater reduction in pulmonary capillary wedge pressure without a significant difference in dyspnea in patients given nesiritide compared to nitroglycerin. Critics of this study report that the average nitroglycerin dose was less than the typically administered dose in clinical practice, thus confounding the results.12, 13 Further criticism has questioned the added benefit of this medication compared to the standard treatments.12 There continues to be debate about the use of nesiritide in the ED. The results of this study do not demonstrate a benefit. However, as mentioned above, one must always examine the population studied when interpreting the authors' conclusions.

Q3.b Colucci et al11 state in the “Discussion” of their original article promoting nesiritide that “standard therapy for decompensated congestive heart failure relies on the use of intravenous diuretics, dobutamine, milrinone, nitroglycerin, and sodium nitroprusside.” They then further state that a problem with nitroglycerin is the development of tolerance to the medication.28 This article was the first major clinical trial for nesiritide and contributed to the FDA approval of this treatment. However, many of the subsequent clinical trials, including Miller's study, did not compare nesiritide to nitroglycerin. As discussed above in the answer to question 1.A. the VMAC trial that compared the 2 was criticized because nitroglycerin was not titrated aggressively.12, 13

The FDA has strict requirements for the approval of a new drug. Their interactive Web page gives an informative overview of this process at http://www.fda.gov/cder/handbook/develop.htm. Briefly, a manufacturer initiates research and development of a new drug. Once the preclinical research, which may include animal studies, demonstrates that the new drug exhibits pharmacologic activity that justifies manufacture and is reasonably safe for small, clinical trials, an investigational new drug application is submitted. If the application is approved, phase I trials commence. These small (20 to 80 subjects) trials are aimed at assessing the safety and adverse effect profile of the medication, in addition to providing early efficacy information. Phase II trials are larger (several hundred subjects), closely monitored trials that test the efficacy and common adverse effects at a range of doses. Phase III trials are large randomized clinical trials that test whether the drug is efficacious in the targeted disease and whether it is not obviously harmful. Phase IV trials, conducted after the drug is approved, further evaluate the safety of a new medication and often seek to identify additional indications for the drug.29

Readers should understand that the FDA does not require that a manufacturer establish that a new agent is superior to existing agents. Manufacturers merely need to demonstrate the efficacy of a new drug by substantial evidence. “Substantial evidence” was defined in the 1962 Drug Amendments, Section 505(d) of the Act as “evidence consisting of adequate and well-controlled investigations, including clinical investigations, by experts qualified by scientific training and experience to evaluate the effectiveness of the drug involved, on the basis of which it could fairly and responsibly be concluded by such experts that the drug will have the effect it purports or is represented to have under the conditions of use prescribed, recommended, or suggested in the labeling or proposed labeling thereof.”30 The guidelines state: “Trials should have an adequate control group. Comparisons may be made with placebo, no treatment, active controls, or of different doses of the drug under investigation. The choice of the comparator depends on, among other things, the objective of the trial.”31 Therefore, the trial's principal investigator and often the pharmaceutical manufacturer may decide what control group to use. The trial needs to demonstrate clinical efficacy and safety but does not need to show superiority over another approved treatment for a specific condition. It makes intuitive sense that a pharmaceutical-industry-sponsored trial, when ethically permissible, might elect to use a placebo control group rather than another active medication. The manufacturer's goal is to gain FDA approval of their new medication through studies that show the drug is safe and has some benefit in the treatment of a specific condition. In cases in which the use of a placebo would not be permissible for ethical reasons, investigators may choose any reasonable active treatment for that disease as the control. For instance, in the Miller et al study, the control was their standard therapy for acutely decompensated heart failure that was primarily treatment with loop diuretics but did not include nitrates.

Let us consider a hypothetical phase III trial evaluating a new intravenous pain medication compared with intravenous morphine for relief of pain related to a long-bone fracture. This sounds like a reasonable comparison; but what if the morphine dose given were 2 mg for adults, whereas the new drug's dose was a weight-based calculation? If the investigators determined that their drug was equivalent or superior to morphine in the relief of pain, would you believe that conclusion? When trials that promote a new treatment are evaluated, it is imperative that physicians and health care administrators closely examine the choice of control groups. Was the selected control treatment a clinically appropriate comparison? If the choice of drugs was reasonable, did the investigators choose a commonly used dose? These are important questions that need to be addressed with any new treatment.

Q3.c Your hospital's pharmacy and therapeutics committee must decide whether to add nesiritide to your hospital's formulary. For a moment, put aside the business decision (could your hospital make more money by using this agent because it will be a profit center) and focus on whether nesiritide would be medically beneficial. In other words, are there sufficient benefits over current therapy to outweigh the reported associated adverse effects? The 2006 Heart Failure Society of America Heart Failure Practice Guidelines state, “Intravenous vasodilators (nitroprusside, nitroglycerin, or nesiritide) may be considered in patients with acutely decompensated heart failure and advanced HF [heart failure] who have persistent severe HF despite aggressive treatment with diuretics and standard oral therapies.” These recommendations are based not on the results of RCTs but on expert opinion (strength of evidence=C).7 Nitroglycerin is a frequently used medication in the treatment of decompensated congestive heart failure because it results in beneficial hemodynamics, prevents worsening of ischemic events, and is tolerated well without inciting dysrhythmias.1 Most emergency physicians and cardiologists are familiar with nitroglycerin's dosing and adverse effect profile. When treating patients with acutely decompensated heart failure, nitroglycerin should be aggressively titrated, and this requires close hemodynamic supervision. Nesiritide gained FDA approval in August 2001. Nesiritide is administered as bolus dose followed by a continuous infusion that does not require titration. Although this could decrease labor costs, the savings is unlikely to compensate for the additional medication costs, which, at one institution, have been previously reported to cost 16 times as much as a 24-hour nitroglycerin infusion ($480 versus $30).12 The results of the VMAC trial have been previously discussed, but there was no difference in global clinical status between the 2 drugs at 3 hours after initiation of therapy.1, 13 There was also no benefit of nesiritide over nitroglycerin in reducing death or hospital readmission at 30 days.12 Furthermore, subsequent studies reported an association with the use of nesiritide and increased short-term risk of death and worsening renal function.2, 17 After these reports were published in prominent medical journals, there was a significant decrease in the use of nesiritide.4 This publication by Miller et al3 does not provide any additional support for using nesiritide. Therefore, despite nesiritide's less intensive dosing administration and reported short-term improvement in hemodynamic function, it seems that nesiritide's treatment benefits do not outweigh the possible associated increase in mortality and renal injury. If we add cost, it appears that nitroglycerin is an equivalent, if not superior, far less expensive medication for the treatment of acutely decompensated heart failure.1, 2, 12, 13, 17

Back to Article Outline

Answer 4 

Q4.a Trial registration is the process by which investigators publicly document their intentions before enrolling patients in a trial. Many journals, including Annals of Emergency Medicine, now insist that all trials be properly registered. There are many reasons why trial registration is now considered an essential element of the clinical research process. The 2 main ones are (1) to combat publication bias, and (2) to discourage investigators from altering the design or conduct of their study.

Publication bias results because positive trial results are more likely to get published than negative ones. When this occurs, systematic reviews of the published literature may fail to uncover negative trial results, which produces a biased summation of the evidence about the topic. By mandating that all trials be registered, the medical research community can decrease the likelihood of bias resulting from the differential publication of positive and negative study results because investigators will be able to identify unpublished trials and get information from the investigators. A recent article by Kirsch et al in PLoS Medicine provides one example of how publication bias may be leading the medical community astray.32

The other benefit of trial registration is that, when done properly, it forces the investigators to declare their primary outcome, inclusion and exclusion criteria, and sample size before initiating the study. This, in theory, should stop investigators from changing primary outcomes in midstream. Although difficult to prove, it is widely believed that some investigators will measure a number of different outcomes and then selectively report those with a favorable outcome. Prospective trial registration, by forcing authors to publicly declare their primary and secondary outcomes before the study begins, may curtail this process. Similarly, by locking authors into a predetermined study size, prospective trial registration should stop investigators from stopping a trial with positive results early (before it has a chance to wander back toward the null) or extending a trial in the hopes of reaching the ill-conceived but still highly sought after “statistically significant at P<.05.”

Miller et al's trial registration can be found at http://clinicaltrials.gov/ct2/show/NCT00559338?id=NCT00559338&rank=1. We found this with a simple Google search “clinical trial registration,” which led us to clinicaltrials.gov, which we searched for “nesiritide emergency department,” which led us to the above link. There are countless other ways to find it. The most striking feature of this trial's registration is the date of registration, which is November 2007, several months after the article was submitted to this journal. In fairness to the authors, investigators were not routinely registering trials at the time this study was begun. As a consequence, however, we cannot be absolutely sure that what is registered precisely represents their intentions at the time the study began. That is why it is essential that trials be registered before the enrollment of subjects. Some would argue that the current registration template for most registries is inadequate and that investigators should be required to post a complete study protocol to a locked repository before conducting a study. The editorial staff at Lancet has encouraged authors to submit their protocols for peer review and offered guaranteed review of all completed articles whose protocol has been put through this peer review process.33

The registration pages for this article have many of the same ambiguities as the article, including vagueness about the nitroglycerin exclusion criteria (is it only intravenous nitroglycerin or all forms?) and uncertainty about how the exclusion criterion “suspected acute coronary syndrome” was implemented because worsening congestive heart failure symptoms could be considered a “history consistent with cardiac ischemia.”

Q4.b CONSORT is not companionship for lonely researchers (see http://consort.org) but the CONsolidated Standards Of Reporting Trials (see http://consort-statement.org). During the 1980s, it became evident that many reports of randomized trials contained insufficient information to permit proper evaluation of the trial's merits and meaning. In the early 1990s, 2 groups independently began working on reporting guidelines for trials. These groups combined forces and produced the first CONSORT statement in 1996. There have been 2 revisions, most recently in 2007.34

The CONSORT Web site provides comprehensive information about reporting guidelines in general and the specific elements in CONSORT. Readers are referred there.35 We briefly consider elements 6 to 11 in CONSORT.

6. Primary outcome: It imperative that investigators clearly define the main outcome of a study before the study occurs. As the CONSORT statement notes:

“All RCTs assess response variables, or outcomes, for which the groups are compared. Most trials have several outcomes, some of which are of more interest than others. The primary outcome measure is the prespecified outcome of greatest importance and is usually the one used in the sample size calculation (see item 7). Some trials may have more than 1 primary outcome. Having more than 1 or 2 outcomes, however, incurs the problems of interpretation associated with multiplicity of analyses (see items 18 and 20), and is not recommended.34

As discussed above, it is equally important that the primary outcome be defined at the trial registration so that readers can be sure that the primary outcome at the end of the study was the primary outcome at the beginning of the study.

7. Sample size: The means by which the study's size was determined should be clearly stated so readers can understand why the study was the size that it was. Whether the sample size was determined by a precision calculation or a power calculation (see answer to question 3.1 in the August 2008 issue36) the authors should precisely describe how this number was determined. This will typically require a clear statement of the difference that the investigators consider meaningful and wish to detect, the expected variability among subjects, and the investigators' requirements about precision, or power (trial sensitivity) and α level (trial specificity). Authors should also indicate what software (if any) was used to assist in the calculations. Regardless, if the reader cannot recreate the sample size calculation from the information presented, the calculation has been inadequately described.

8. Randomization: Generation: Authors should indicate how the randomization scheme was generated. As noted in the CONSORT statement:

… “[R]andom” is often used inappropriately in the literature to describe trials in which nonrandom, “deterministic” allocation methods, such as alternation, hospital numbers, or date of birth were used. When investigators use such a method, they should describe it exactly and should not use the term “random” or any variation of it. Even the term “quasi-random” is questionable for such trials. Empirical evidence indicates that such trials give biased results. Bias presumably arises from the inability to conceal these allocation systems adequately (see item 9).35

Investigators should explain why they used (or did not use) stratified randomization or randomization in blocks. Stratified randomization helps ensure that subjects in different strata of a variable will be equally represented in the limbs of a study. For example, in the Miller et al nesiritide trial,3 patients with severe shortness of breath and grossly abnormal vital signs might be randomized separately from those with less severe signs and symptoms. Stratification would diminish the probability that, by chance alone, the sicker patients were allocated to one arm of the study.

Blocking ensures that the N for intervention and control groups does not stray too far from the desired ratio. For example, if we conducted many experiments, each time randomizing 100 subjects to 2 groups with equal probability, we can expect that in about 5% of experiments there will be 60 or more subjects in one group and 40 or fewer in the other. If we block randomize in groups of 20, however, then every time 20 subjects are enrolled, we are assured that the number of subjects in the groups will be equal. Although it is possible that when 90 subjects are enrolled the number of subjects in the 2 groups could be as extreme as 50 versus 40, by the time 100 subjects are enrolled this should equalize at 50 versus 50. When block randomization is performed, the block size should be large enough to ensure that allocation concealment is not compromised. In other words, if one randomized in blocks of 2 and the treatment were identifiable (the patients who received treatment A always had facial flushing), then when the first patient in a block blushed, then the investigator would know what the next patient was going to receive. As the blocks grow, this becomes less of a problem.

9. Allocation concealment is a crucial component of a randomized trial. Imagine that a perfectly random allocation schedule is developed (a list that says “treatment, treatment, treatment, control, treatment, control…”) and is posted on the wall of the room where recruitment takes place. Each time a patient is enrolled, the instructions are followed and that word is crossed out. A study conducted in this manner would be randomized but is highly prone to selection bias. It would be obvious to both patient and provider what arm of the study this patient will be assigned to. This knowledge would affect the patient's probability of enrolling and therefore could confound the study (for example, sicker patients might not wish to receive placebo).

It is therefore important to ensure that neither the subject nor the persons responsible for enrolling the subject have knowledge of what group the patient will be in. Enrollment first, allocation after. There are many methods of concealing allocation, and the study should clearly describe what method was used. On the other hand, it is unclear whether reporting guidelines truly help us with respect to whether the study was truly concealed. Imagine for a minute that a study uses numbered envelopes and for each new enrollee the next envelope is opened to reveal a slip of article that indicates group assignment. Assume that these envelopes are fairly flimsy and that by holding the envelope up to the light the investigator can read the group assignment. Do you really believe that the investigator/author will submit an article that say “by mistake we used flimsy envelopes that, when held to the light, revealed group assignment?” Sorry to be misanthropic but we do not expect to see such text anytime soon. Instead, we always get, “treatment assignments were contained in sequential opaque manila envelopes…” Readers can consider whether reporting guidelines truly improve the reporting of science or whether they simply encourage a series of white lies that spiff up the article for journal peer reviewers and editors but render the article an inaccurate representation of what actually transpired.

10. Randomization implementation: The authors should describe how each subject was allocated to a group. Were different individuals responsible for enrollment and allocation? Is there any possibility for corruption of the enrollment and allocation process?

11. Blinding and masking: Much has been written on this subject and we will consider it again in future journal clubs. Briefly, just as those enrolling patients should not know what group each patient will be assigned to, so also those treating or assessing patients should not know what group the patient is in. This is called blinding. One might blind the patients (so they do not know what group they are in), the practitioners, the outcome assessors, and the data analysts. Blinding decreases the possibility of confounding but it is not always feasible. How do we blind a trial of laminectomy versus physical therapy for lumbar disk herniation? One cannot ethically perform sham surgery. There are also times when blinding is not crucial. For example, if the outcome assessor is performing a 40-minute in-person interview to assess the effect of a new treatment for social phobia, then blinding of the assessor would be crucial. However, if the assessor's task was to determine whether the patient was dead or alive (according to physical examination), then blinding would be less crucial as misclassification of the outcome is unlikely, even if the assessor is unblinded. Thus, like all things in clinical research, there is a general principle—blinding is good; do it when feasible—but no absolute rule. The specifics of the situation will dictate the feasibility and importance of blinding each member of the research process.

Q4.c 6. Primary outcome: Although the abstract's “Objective” section states: “We examined the effect of an 8-hour infusion of nesiritide on the composite of return to the ED or hospitalization at 30 days,” there is no clear definition of the primary outcome in the “Methods” section of the article. Nevertheless, it is clear that this is the investigators' intent. From a reporting perspective, this article successfully identifies the primary outcome. A more interesting question is whether this outcome makes sense for this type of intervention. This is addressed in question 5a.

7. Sample size: The authors write:

“Previous work in this population demonstrated a 60% rate of readmission at 30 days.12 The target sample size for the study was based on the previously established event rate of 60% readmission at 30 days. The minimum sample size was prospectively determined to be 104 patients, which represents the minimum number of patients needed to provide 80% power (α=0.05) to detect a 30% relative reduction in readmission from 60% to 42% at 30 days postinfusion in the acutely decompensated congestive heart failure patients receiving an 8-hour intravenous infusion of nesiritide.19

In STATA 10 we type:

and find that:

How did the authors get 52 (104/2) patients per group?

We try

and find that:

Now 106 is pretty close to 104 but (a) there is no reason to assume that a 1-sided test will be used—we do not know with certainty that nesiritide must be the same as or better than standard therapy and cannot be worse—and (b) that is 106 patients per group, not 106 patients total.

Journal editors and readers alike should be humbled that the peer review process (which included an author of this journal club) failed to detect this error. We direct readers to an excellent article by Goodman and Berlin,37 which explains how to interpret information from studies that enroll less than the desired number of subjects (hint: they recommend against calculating the power according to the actual N!).

8, 9, and 10. Randomization: Generation and Implementation and Allocation Concealment: The relevant part of the “Methods” section is:

Patients presenting to the ED were screened by the ED staff. Those identified provided consent and were randomized with a block randomization table in blocks of 20 (Microsoft Excel 1997; Microsoft, Redmond, WA). The assignment of randomization was determined by the pharmacy, and medications were dispensed to the clinical trial office nurse for administration to the patient in the ED.20

Excel has a function called RAND that produces random numbers between 0 and 1. Presumably, the authors generated a list of these numbers and replaced those less than 0.5 with “nesiritide” and those more than 0.5 with “placebo” or some similar strategy. This seems reasonable. Although the pharmacist may have known what drug the next patient would receive, it seems that the investigators have taken steps to ensure that the person enrolling the subject did not. Thus, allocation concealment and randomization in this study seem adequate.

11. Blinding: The authors indicate that “Blinding of all of the providers, including a clinical trials nurse, was maintained throughout the trial.” We could not find any other comments about blinding or masking. We presume that patients were blinded as well but do not know whether those abstracting the charts for mortality or rehospitalization were blinded nor whether those doing the statistical analysis were blinded. We do not know whether “a clinical trials nurse” means “the one clinical trials nurse” or “one of several clinical trials nurses.”

We also do not know how successful blinding was. Could patients tell which solution they were receiving because the nesiritide drip caused burning in the arm, a metallic taste, or some other sign? Could nurses and investigators tell because those who received the nesiritide bolus always had a decrease in their blood pressure? This is not to say that any of this happened, but the article does not assert that it did not. Some studies ask participants (patients or providers) to guess what limb the patient is in. When guesses are right 50% of the time, one knows that blinding was successful!

Thus, with the exception of the sample size issue, this article does a good job of convincing us that items 6 to 11 in CONSORT were fairly well handled. As discussed in other part of this journal club, issues about inclusion and exclusion criteria, power, and the chosen comparative arm are greater threats to this study than are randomization, allocation concealment, and blinding.

Back to Article Outline

Answer 5 

Q5.a This randomized trial of 101 patients with acutely decompensated congestive heart failure treated at an urban ED reports that those treated with an 8-hour infusion of nesiritide in addition to this ED's standard treatment had an approximately 2% greater readmission rate than those given placebo plus standard treatment. This result provides no support for the use of nesiritide in the ED.

We must consider, however, whether the outcome measure was reasonable. The authors write:

“This dramatic decrease in readmission rates merits additional comment. In concert with initiation of this clinical trial, several process-of-care improvement initiatives were started: a standard admission order set was introduced in the ED for patients with decompensated heart failure, discrete criteria were established to govern hospitalization for decompensated heart failure versus discharge to home with clinic follow-up, and a heart failure clinic was available to facilitate outpatient care.”

Given this, is the authors' hypothesis that an 8-hour infusion of nesiritide would decrease 30-day ED recidivism and readmission rates a reasonable one? It would seem that in an underserved, urban, county hospital population with a 60% heart failure readmission rate, the cointerventions would have a greater effect on 30-day outcome than a nesiritide infusion on day 1. We are surprised that the authors did not target more focused outcomes such as need for hospital admission, hospital length of stay, and shorter-term outcome measures. It is possible that nesiritide might have had a positive effect on these outcomes. Although this article does not make much of a case for nesiritide, the fact that readmission rates in both limbs decreased substantially (41.5% and 39.6%) from historical rates (60%) suggests, but by no means proves, that the cointerventions were useful.

Q5.b Colucci et al11 state in their New England Journal of Medicine 2000 article heralding nesiritide that BNP is made in the myocytes of the ventricle and the circulating levels increase in patients with acutely decompensated heart failure.11 Nesiritide, a recombinant form of BNP, is reported to have beneficial hemodynamic actions in patients with acutely decompensated heart failure. These include arterial and venous dilatation, increased excretion of sodium, and suppression of both the renin-angiotensin-aldosterone and sympathetic nervous systems.11

The rationale behind the efficacy of nesiritide is that although levels of human BNP are increased in acutely decompensated heart failure, the increase of endogenous BNP is insufficient to promote clinically significant diuresis, natriuresis, and vasodilation.13 Therefore, the infusion of nesiritide supplements the endogenous BNP in patients with acutely decompensated heart. Readers might consider whether there is strong evidence that, despite its increase in heart failure, BNP levels are still insufficient or whether it is possible that an inadequate response to the BNP that is circulating is the primary problem in heart failure.

Q5.c A more detailed description of the inclusion/exclusion criteria would assist the reader with the critical interpretation of these results. Were only patients who were given intravenous nitroglycerin excluded or were patients who received nitroglycerin in other forms also excluded? Why were these patients excluded? How did the investigators determine whether a patient was significantly more overloaded than their baseline?

Additionally, it would be of interest to know what the ED return rate and readmission rates were for the heart failure patients treated with the new protocol who were not enrolled in this study.

Back to Article Outline

References 

  1. Publication Committee for VMAC Investigators. Intravenous nesiritide vs nitroglycerin for treatment of decompensated congestive heart failure: a randomized controlled trial. JAMA. 2002;287:1531–1540
  2. Sackner-Bernstein JD, Kowalski M, Fox M, et al. Short-term risk of death after treatment with nesiritide for decompensated heart failure: a pooled analysis of randomized controlled trials. JAMA. 2005;293:1900–1905
  3. Miller AH, Nazeer S, Pepe P, et al. Acutely decompensated heart failure in a county emergency department: a double-blind randomized controlled comparison of nesiritide versus placebo treatment. Ann Emerg Med. 2008;51:571–578
  4. Hauptman PJ, Schnitzler MA, Swindle J, et al. Use of nesiritide before and after publications suggesting drug-related risks in patients with acute decompensated heart failure. JAMA. 2006;296:1877–1884
  5. Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology. 1999;10:37–48
  6. Glymour MM, Greenland S. Causal diagrams. In:  Rothman KJ,  Greenland S,  Lash TL editor. Modern Epidemiology. 3rd ed.. Philadelphia, PA: LippincottM; 2008;p. 183–212
  7. Heart Failure Society of America. HFSA 2006 comprehensive heart failure practice guideline. J Card Fail. 2006;12:e1–e2
  8. Dao Q, Krishnaswamy P, Kazanegra R, et al. Utility of B-type natriuretic peptide in the diagnosis of congestive heart failure in an urgent-care setting. J Am Coll Cardiol. 2001;37:379–385
  9. Maisel AS, Krishnaswamy P, Nowak RM, et al. Rapid measurement of B-type natriuretic peptide in the emergency diagnosis of heart failure. N Engl J Med. 2002;347:161–167
  10. Mills RM, LeJemtel TH, Horton DP, et al. Natrecor Study Group Sustained hemodynamic effects of an infusion of nesiritide (human b-type natriuretic peptide) in heart failure: a randomized, double-blind, placebo-controlled clinical trial. J Am Coll Cardiol. 1999;34:155–162
  11. Colucci WS, Elkayam U, Horton DP, et al. Intravenous nesiritide, a natriuretic peptide, in the treatment of decompensated congestive heart failure (Nesiritide Study Group). N Engl J Med. 2000;343:246–253
  12. Topol EJ. Nesiritide—not verified. N Engl J Med. 2005;353:113–116
  13. Collins SP, Hinckley WR, Storrow AB. Critical review and recommendations for nesiritide use in the emergency department. J Emerg Med. 2005;29:317–329
  14. Marx J, Hockberger R, Walls R. Rosen's Emergency Medicine. Concepts and Clinical Practice. 5th ed.. St. Louis, MO: Mosby; 2002;1122
  15. Burger AJ, Horton DP, LeJemtel T, et al. Effect of nesiritide (B-type natriuretic peptide) and dobutamine on ventricular arrhythmias in the treatment of patients with acutely decompensated congestive heart failure: the PRECEDENT Study. Am Heart J. 2002;144:1102–1108
  16. Yancy CW, Saltzberg MT, Berkowitz RL, et al. Safety and feasibility of using serial infusions of nesiritide for heart failure in an outpatient setting (from the FUSION I trial). Am J Cardiol. 2004;94:595–601
  17. Sackner-Bernstein JD, Skopicki HA, Aaronson KD. Risk of worsening renal function with nesiritide in patients with acutely decompensated heart failure. Circulation. 2005;111:1487–1491
  18. Peacock W, Holland R, Gyarmathy R, et al. Observation unit treatment of heart failure with nesiritide: results from the Proaction Trial. J Emerg Med. 2005;29:243–252
  19. Peacock WF. Initial results from the PROACTION Study. J Emerg Med. 2006;31:435–436
  20. Saul S. Heart drug maker will lay off 150 (New York Times. February 24, 2006). http://www.nytimes.com/2006/02/24/business/24scios.htmlAccess May 20, 2008
  21. Mehrotra L, Weldon B, Caruso D. JNJ-Q4 2007 Johnson & Johnson earnings conference call: final transcript. In: Thompson Financial. Thomson StreetEvents; 2008;p. 1–25
  22. Hennessy S. Use of health care databases in pharmacoepidemiology. Basic Clin Pharmacol Toxicol. 2006;98:311–313
  23. Pearl J. Causality. New York, NY: Cambridge University Press; 2000;
  24. Greenland S, Rothman KJ, Lash TL. Measure of effect and measures of association. In:  Rothman KJ,  Greenland S,  Lash TL editor. Modern Epidemiology. 3rd ed.. Philadelphia, PA: Lippincott; 2008;p. 56–60
  25. Altman DG, Dore CJ. Randomisation and baseline comparisons in clinical trials. Lancet. 1990;335:149–152
  26. Senn S. Testing for baseline balance in clinical trials. Stat Med. 1994;13:1715–1726
  27. Diercks DB. Can we improve treatment of heart failure in the emergency department?. Ann Emerg Med. 2008;51:583–584
  28. Packer M, Lee W, Kessler P, et al. Prevention and reversal of nitrate tolerance in patients with congestive heart failure. N Engl J Med. 1987;317:799–804
  29. Hulley S, Cummings S, Browner W, et al. Designing Clinical Research. 3rd ed.. Philadelphia, PA: Lippincott Williams & Wilkins; 2007;168–169
  30. US Department of Health and Human Services, Food and Drug Administration, Center for Drug Evaluation and Research, Center for Biologics Evaluation and Research. Guidance for Industry. Providing Clinical Evidence of Effectiveness for Human Drug and Biological Products. US Department of Health and Human Services, Food and Drug Administration, Center for Drug Evaluation and Research, Center for Biologics Evaluation and Research; 1998:1-23.
  31. US Department of Health and Human Services, Food and Drug Administration, eds. International Conference on Harmonisation: guidance on general considerations for clinical trials. Fed Reg. 1997;66113–66119
  32. Kirsch I, Deacon BJ, Huedo-Medina TB, et al. Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration. PLoS Med. 2008;5:e45;doi:10.1371/journal.pmed.0050045
  33. McNamee D. Review of clinical protocols at The Lancet. Lancet. 2001;357:1819–1820
  34. http://www.consort-statement.org/index.aspx?o=1210Accessed April 20, 2008
  35. Moher D, Schulz KF, Altman DG. The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. Ann Intern Med. 2001;134:657–662
  36. Barrett TW, Schriger DL. Practical considerations in HIV testing in the emergency department, characteristics of diagnostic tests, and the role of sensitivity analysis in observational studies: answers to March 2008 Journal Club questions. Ann Emerg Med. 2008;52:170–181
  37. Goodman SN, Berlin JA. The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. Ann Intern Med. 1994;121:200–206

 Editor's Note: You are reading answers to the third installment of Annals of Emergency Medicine Journal Club. The questions and the article they are about (Miller et al. Ann Emerg Med. 2008;51:571-578.) were published in the May 2008 issue. Information about journal club can be found at http://www.annemergmed.com/content/journalclub. Readers should recognize that these are suggested answers. We hope they are accurate; we know that they are not comprehensive. There are many other points that could be made about these questions or about the article in general. Questions are rated “novice,” (), “intermediate,” (), and “advanced” () so that individuals planning a journal club can assign the right question to the right student. The novice rating does not imply that a novice should be able to spontaneously answer the question. “Novice” means we expect that someone with little background should be able to do a bit of reading, formulate an answer, and teach the material to others. Intermediate and advanced questions also will likely require some reading and research, and that reading will be sufficiently difficult that some background in clinical epidemiology will be helpful in understanding the reading and concepts. We are interested in receiving feedback about this feature. Please e-mail journalclub@acep.org with your comments.

PII: S0196-0644(08)00787-7

doi:10.1016/j.annemergmed.2008.05.001

Annals of Emergency Medicine
Volume 52, Issue 4 , Pages 458-472, October 2008