Journal Home
Search for

Volume 52, Issue 6, Pages 754-763 (December 2008)


View previous. 27 of 39 View next.

Journal ClubRisk Prediction With Procalcitonin and Clinical Rules in Community-Acquired Pneumonia: Answers to the July 2008 Journal Club Questions

Michael D. Menchine, MD, MPHa, David L. Schriger, MD, MPH (Section Editor)b, Tyler W. Barrett, MD (Section Editor)c

Refers to article:
Journal Club questions Risk Prediction With Procalcitonin and Clinical Rules in Community-Acquired Pneumonia , 17 March 2008
David T. Huang, Lisa A. Weissfeld, John A. Kellum, Donald M. Yealy, Lan Kong, Michael Martino, Derek C. Angus, GenIMS Investigators
Annals of Emergency Medicine
July 2008 (Vol. 52, Issue 1, Pages 48-58.e2)
Abstract | Full Text | Full-Text PDF (361 KB)

Article Outline

Discussion Points

Answer 1

Answer 2

Answer 3

Answer 4

Answer 5

References

Copyright

Discussion Points 

return to Article Outline


1.When an investigation is designed, it is helpful to have a theoretical model of the problem. A. In this study, the authors chose to look at the relationship of procalcitonin and pneumonia. What is already known about procalcitonin levels and infection? Can the procalcitonin level differentiate between viral and bacterial disease? Is procalcitonin level correlated with outcome in bacterial illness? B. Create a schematic conceptual model that shows how procalcitonin and Pneumonia Severity Index (PSI) are related to pneumonia outcomes. What are the likely shapes of these relationships? Procalcitonin level can be treated as a continuous variable, can be divided—as these authors do—into several categories, or can be treated as a binary (low, high) variable. What are the advantages and disadvantages of each approach? According to your model, do you expect progressively higher procalcitonin levels to correlate with progressively worse outcomes or do you expect normal procalcitonin levels to predict good outcomes and abnormal values to have similar frequencies of adverse outcome regardless of the magnitude of the elevation?

2. What are the advantages and disadvantages of choosing all-cause death at 30 days as the primary outcome of interest? When emergency physicians determine the disposition (home, regular bed, monitored bed, ICU) of pneumonia patients, are they thinking about 30-day mortality or something else? What other outcomes might be of interest? What assumptions must be made about hospital admission to justify the use of 30-day mortality as an outcome in this study? Are these assumptions likely to be correct?

3. The authors use likelihood ratios to describe their results (their Table 3). What is a positive likelihood ratio (LR+)? A negative likelihood ratio (LR−)? Contrast likelihood ratios to other measures of test performance and describe their advantages and disadvantages. Why, in theory at least, are likelihood ratios particularly useful at the bedside? What does the LR− of 0.09 (95% confidence interval [CI] 0.02 to 0.36) found in this study mean quantitatively and qualitatively? What numbers is it based on (calculate it!)?

4. Multicenter studies make it possible to enroll large numbers of subjects and offer a greater chance for external generalizability but present analytic challenges. What are some of the analytic challenges that arise from multicenter studies and what are some of the techniques used to overcome these? Consider issues about the presentation of results and the statistical analysis of the data. In this study, what information about the role of individual study sites would help readers understand and interpret the meaning of the results?

5. What would you choose as the next step in evaluating the effect of procalcitonin testing on pneumonia patients? Should we start using it in clinical practice and see how we like it? Should we test it in an external validation set (how is this done)? Should we conduct a randomized controlled trial? How would you design such a trial? What would the intervention be? What would the outcome of interest be?

Answer 1 

return to Article Outline

Q1.a In this study, the authors chose to look at the relationship of procalcitonin and pneumonia. What is already known about procalcitonin levels and infection? Can the procalcitonin level differentiate between viral and bacterial disease? Is procalcitonin level correlated with outcome in bacterial illness?

Theoretical model building is the process wherein researchers begin to synthesize information from the medical literature, observations from their own laboratories, and even their own hunches into a coherent and simplified explanation of a complicated phenomenon. The relationships of relevant variables are described in terms of magnitude and direction. These models can be specified graphically, mathematically, or by narrative. Once established, the model helps researchers plan investigations that will test the relationships predicted by the model against observable data. If the data refute the model or part of the model, the model is adjusted or abandoned. Although this process (observe, build theory, test, and repeat) is an essential part of many scientific disciplines, the medical literature has been largely devoid of specific discussions of model building. The absence of discussion of theoretical models can leave readers confused about why the investigators chose to examine a certain variable compared with another seemingly equivalent choice.2

Let's consider what we know about procalcitonin (based solely on the Huang et al “Introduction” section) and build a theoretical model that will relate procalcitonin levels to the study's primary outcome, 30-day all-cause mortality.

What do we know?


1)Procalcitonin is increased in bacterial infections but low in viral infections.3

2)Procalcitonin has good discrimination for bacterial infections.4, 5, 6, 7

3)Three trials used low procalcitonin levels to withhold antibiotics in emergency department (ED) patients presenting with respiratory symptoms.8, 9, 10

4)Two meta-analyses concluded that procalcitonin could not differentiate sepsis from noninfectious inflammation in critically ill patients, and procalcitonin had only moderate diagnostic performance at identifying bacteremia in ED patients.11, 12

5)Higher procalcitonin scores have been observed to associate with higher Pneumonia Severity Index (PSI) scores in one study but not in another.13, 14

The first 3 points suggest that procalcitonin may be increased in bacterial infections but not in viral infections. The fifth point demonstrates that higher PSI has not been consistently associated with higher procalcitonin level.

Q1.b Create a schematic conceptual model that shows how procalcitonin and Pneumonia Severity Index are related to pneumonia outcomes. What are the likely shapes of these relationships? Procalcitonin level can be treated as a continuous variable, can be divided-as these authors do-into several categories, or can be treated as a binary (low, high) variable. What are the advantages and disadvantages of each approach? According to your model, do you expect progressively higher procalcitonin levels to correlate with progressively worse outcomes or do you expect normal procalcitonin levels to predict good outcomes and abnormal values to have similar frequencies of adverse outcome regardless of the magnitude of the elevation?

At this point, it would be appropriate to assert the model's general form. Here is a list of variables that we believe are related to the phenomena of interest. In this case:


There is a relationship between procalcitonin and severe bacterial infections.

There is a relationship between PSI score and procalcitonin level.

There is a relationship between procalcitonin and death.

It may be necessary or helpful to specify the causal pathways involved in the model. For example, pneumonia results in endotoxin production, which induces an inflammatory response in the host. This inflammatory response can lead to coagulopathy, vital sign disturbances, organ dysfunction, and death. The intensity of this process is mediated by patient characteristics such as age, sex, and previous health status (comorbidities). Indicators of the inflammatory response include procalcitonin and WBC count. Though procalcitonin does not directly kill the host, it may serve as a marker for the severity of the inflammatory response. The components of PSI score include elements that mediate the illness (eg, age, comorbidities), indirect markers of inflammation, and direct markers of end organ damage (eg, systolic blood pressure, blood urea nitrogen).

Typically, the general scheme relating all relevant elements is specified in a diagram. Such diagrams can be informal (Figure 1) or can follow the conventions used for causal diagrams.2, 15 Once the general relationships among variables are mapped out, the nature of the relationships among sets of variables can be specified. Is procalcitonin thought to be linearly related to PSI score? How exactly is procalcitonin thought to relate to all-cause mortality? Is this relationship linear? Logistic? J-shaped? Over what range of values is each of these specified relationships maintained? It is likewise important to consider the interaction among variables thought to influence the outcome of interest. For example, PSI is related to all-cause mortality and procalcitonin is related to all-cause mortality. When viewed together, how are PSI and procalcitonin jointly thought to relate to all-cause mortality? Does procalcitonin level have the same predictive effect on mortality across all values of PSI? It is of great importance that variables not included in the model be addressed because omitting a variable from a model is equivalent to saying, “I am 100% certain that this variable has no direct or indirect effects on outcome.” Why was the number of lung lobes affected by the pneumonia not included? Surely this could influence mortality. The authors should discuss the rationale for excluding seemingly important variables from the model.


View full-size image.

Figure 1. General relationships of relevant variables. PSI, Pneumonia Severity Index; CURB-65, Confusion, Uremia, Respiratory Rate, low blood pressure, age 65 or older; AMS, altered mental status; low BP, low blood pressure; Temp, temperature; RR, respiratory rate; HR, heart rate; Na, sodium level; HCT, hematocrit; Gluc, glucose; BUN, blood urea nitrogen; pO2, pulse oximetry.


Finally, each variable in the model must be specified and operationalized. Is age to be treated as a continuous variable? Is a change in age from 18 to 28 years to be considered equivalent to a change in age from 60 to 70 years? Should procalcitonin be treated as a continuous variable associated with the ordinal variable PSI? Should procalcitonin be converted to an ordinal variable and compared with the ordinal PSI class? At each step of the model generation, from form to specification, the rationale for choices should be described. Ideally, previous theory and data should dictate model generation, but some inductive process is common and should be explained and justified. Once the theoretic model is laid down, specific hypothesis generation is appropriate. In this example, there are many ways one could construct this model because we have so little information to deduce it from. Depending on the form and specifications one chose, a great number of questions could arise. The authors do not explicitly detail their model but hypothesize that “an early singular procalcitonin measurement would aid risk assessment beyond that available from the PSI.”

According to the model we constructed, we might have hypothesized “that normal procalcitonin levels will predict cases that are later proved to not be pneumonia. Further, because these patients do not have pneumonia, we hypothesize that their 30-day mortality will be much lower than those proven to have pneumonia.” An alternative hypothesis (based on point 5) could be that “higher procalcitonin scores may associate with higher PSI or CURB 65 scores and, by extension, mortality.”

The authors could have dichotomized the procalcitonin results into normal and abnormal a priori. Instead, they choose to divide procalcitonin into tiers. Why? For starters, this follows the lead of 3 previous studies. But this alone is a poor rationale because none of the previous studies provide justification for this choice. A more cogent reason for choosing a tiered approach is to attempt to demonstrate a dose-response relationship between procalcitonin level and mortality. In general, the presence of such a relationship makes it more likely that there is a biological link between the variables, eg, the processes that increase procalcitonin and the processes that lead to death. For example, the more one smokes, the higher the risk of developing lung cancer. Given the author's decision to maintain the tiers, what do you suppose the mortality should be for patients in tier 1 compared with tiers 2, 3, and 4? Indeed tier 1 has a low mortality, at 1.5% (Huang et al, Table 2). However, the dose-response relationship is quickly lost. Tiers 2, 3, and 4 have mortalities of 8.4%, 9.5%, and 8.9%, respectively. The dose-response relationship is further undermined by the Huang et al figures (Figures 2 and 3) that show, again, low mortality for tier 1 procalcitonin level but no relationship between mortality and the higher procalcitonin tiers.


View full-size image.

Figure 2. Fagan Nomogram.


Given the authors' decision to divide the procalcitonin into tiers, it is somewhat surprising that the absence of a dose-response relationship is not emphasized. Rather, the procalcitonin levels are repackaged as normal and abnormal. The newly dichotomized procalcitonin level is then analyzed within each of the 5 PSI class subgroups. The authors then claim that the procalcitonin level is a useful adjunct in PSI classes IV and V. This relationship does not appear to have been hypothesized a priori, and we are left to wonder whether the authors, having observed Figures 2 and 3, developed a post hoc theory that normal procalcitonin levels predict good outcomes in patients with high PSI class. If this is the case, such findings must be viewed as hypothesis generating, and no conclusions about their validity should be drawn. Much has been written about the dangers of post hoc subgroup analysis,16, 17 and this topic will be considered in detail in a future journal club.

Consider that instead of emphasizing the results observed in PSI subgroups 4 and 5, the authors could have emphasized the relationship of procalcitonin to mortality in PSI classes I to III. For these PSI classes, the positive likelihood ratio (LR+) is 0.97, suggesting that a high (or positive) procalcitonin level actually reduces the odds of an adverse outcome (see question 3 answer for a more detailed discussion of likelihood ratios). This is dramatically contrary to the stated hypothesis that “an early singular procalcitonin measurement would aid risk assessment beyond that available from the PSI.” Why, then, do the authors believe the procalcitonin level is of clinical value? A detailed theoretical model specifying the manner in which dichotomized procalcitonin level was thought to supersede the PSI score in class IV and V cases but not in I to III cases might allow for more enthusiastic acceptance of the study results. In the absence of this model, the reader is left wondering whether the curious observation that a normal procalcitonin level is associated with good outcomes despite PSI score is a tested truth or random noise.

Answer 2 

return to Article Outline

Q2. What are the advantages and disadvantages of choosing all-cause death at 30 days as the primary outcome of interest? When emergency physicians determine the disposition (home, regular bed, monitored bed, ICU) of pneumonia patients, are they thinking about 30-day mortality or something else? What other outcomes might be of interest? What assumptions must be made about hospital admission to justify the use of 30-day mortality as an outcome in this study? Are these assumptions likely to be correct?

The selection of outcomes is a critical and necessarily difficult tradeoff in most clinical studies. Researchers must balance the ease of measurement of a particular outcome against the clinical meaning of that outcome. Much has been written about the importance of patient-centered outcomes in clinical research (eg, an asthma study in children should measure days missed from school or patient symptom self-report, not change in forced expiratory volume in one second [FEV-1]), and such outcomes are preferred even if they are harder to measure. In the Huang et al study, the researchers chose all-cause 30-day mortality as the outcome of interest. This outcome is intuitively appealing; it is easy to measure and should have high reliability and validity; it is unlikely there will be much interrater disagreement about whether a person is alive or not! Further, it is clinically obvious that we want to keep people alive, so mortality often seems to be the most important outcome measure possible. Certainly in a randomized clinical trial of different antibiotic agents for pneumonia patients, mortality would be a relevant outcome variable, and 30-day mortality is preferred over shorter-term measures (3-day or 7-day mortality) because it has greater clinical meaning.

But is 30-day all-cause mortality the best primary outcome measure for the Huang et al study? Do we truly believe that an initial procalcitonin level obtained in the ED will affect 30-day all-cause mortality? The causal chain would run thus: early knowledge of the procalcitonin level leads to a change in initial management that affects all-cause 30-day mortality. But does our ED treatment of pneumonia have that much influence on 30-day all-cause mortality? Would not the many patients for whom pneumonia is the penultimate stop on the way to a death from other causes obscure any true benefit? Will the outcome of a patient with advanced cancer who develops a mild pneumonia, is treated with antibiotics, improves, but dies of pulmonary embolism 2 weeks later be affected by the measurement of procalcitonin in the ED?

Would it be better to shorten the timing of the outcome or change the outcome from all-cause mortality to death from pneumonia? Surely if we cannot observe a change in 3- or 7-day mortality from pneumonia, then it is unlikely that 30-day all-cause mortality will be affected. Conversely, if we did observe a change in a shorter-term, pneumonia-specific outcome, then we might go out to 30 days to determine whether the difference in mortality is sustained. In other words, are a sufficient number of deaths averted at 30 days to make the measurement of procalcitonin worthwhile?

Although it might be more telling to determine whether PSI, CURB-65, or procalcitonin was able to predict pneumonia-specific complications and death, such outcomes are more difficult to measure than all-cause mortality because an expert must stand in judgment and determine whether patients' deaths were a result of the pneumonia. Whatever process this expert used will be much more subjective than simply counting the bodies. A second expert would likely be required to judge each case, and the agreement between the judges would then be determined. A third judge may be required in cases of disagreement, and even if there is agreement between the judges, reviewers and critics will be suspicious of the validity of the methods used to determine death causes. So we see here the tradeoff between clinical importance and feasibility.

The authors present data demonstrating that increasing CURB 65 and PSI predicts increasing risk of death at 30 days, as does procalcitonin level (as displayed in Huang et al's Table 2 on page 52). But how should clinicians use this knowledge? The implication has been plainly put forth by the PSI developers: Patients with PSI scores of I and II should be discharged, those with scores of IV or V should be admitted, and those within class III could be either discharged or admitted.18 For this to be a rational plan, we must assume that the act of hospitalizing a patient somehow independently affects the death rate. Conversely, for us to assume that a low score allows us to discharge a patient home assumes that the sole purpose of hospitalization is to prevent 30-day mortality and that the medical care provided during hospitalization of a patient in class I or II was not the very reason that this lower-risk patient did not die.

Consider these typical, though hypothetical, examples:

Case 1) An 84-year-old man with advanced prostate cancer presents to the ED with fever and mild, nonproductive cough for several days. He appears chronically ill, has fever to 100.8°F (38.2°C), but is in no respiratory distress. He is at his baseline mental status, which is confused and lethargic. A chest radiograph confirms right-sided lower-lobe pneumonia without effusion. He has a peripherally inserted central catheter (PICC) line, through which he gets intermittent intravenous fluids, and has a home health nurse. This individual has a very high PSI of 114 (class IV); no doubt his risk of death at 30 days from any cause is enormous. Even his risk of pneumonia-specific mortality is very high. However, after a brief conversation with his family and primary care physician, you determine that intravenous antibiotics through the PICC line and oxygen can be safely administered at home and his family is more than capable of monitoring him for clinical deterioration.

Case 2) A 24-year-old previously healthy graduate student presents with rigors, shaking chills, and a deep, productive cough for 12 hours. He appears uncomfortable and ill. His respiratory rate is 26 breaths/min. Pulse oximetry is 91%. Chest radiograph confirms a dense R middle-lobe consolidation. His WBC count is 22,000, but other laboratory results are unremarkable. He lives alone in a walk-up third-story apartment, with no close friends or family in the area. His PSI score is 50 (class I).

These 2 cases illustrate opposite ends of the spectrum. The elderly patient is unlikely to receive any additional benefit from inpatient hospitalization. Indeed, given our current understanding of medical error and iatrogenic injury, hospitalization is more likely to harm than help this patient. Conversely, the young graduate student has a low PSI score, indicating a low risk for 30-day mortality. Would you consider admitting him? We would! Is he likely to die in 30 days? Of course not. He is young and healthy; he could recover from being hit by a truck at high speed. But might he get worse and return to the ED dehydrated in a more advanced state of sepsis and require a longer convalescence? Possibly. Could a short hospitalization with intravenous antibiotics and close monitoring for clinical deterioration at best alter the course of his disease and at worst make him more comfortable than he would be alone in his apartment? Absolutely.

The point of these 2 examples is to reinforce the concept that the need for hospitalization is not necessarily dictated by the probability of death. Given the tremendous bioavailability of oral quinolones and other antibiotics, hospitalization is unlikely to alter the ultimate outcome of pneumonia in many cases. Therefore, efforts that assume that the probability of mortality dictates patient disposition are fundamentally misguided.

The ability of a normal procalcitonin level to identify PSI classes IV and V suitable for discharge is further undermined by Table E1 from Huang et al, featured here. Here we see that 50% of patients with PSI class V and low procalcitonin level were in a state of severe sepsis on day 1 (12% of PSI IV cases had severe sepsis on day 1). Clearly, discharging patients with severe sepsis would not be possible.

Table 2.

Enrollment and admission rate by hospital.

HospitalEligible, No.Enrolled, No.PSI Classes 1+2, Patients Admitted to Hospital, No.PSI Classes 1+2, Patients Enrolled, No.Patients Admitted/Enrolled (PSI I and II Only), No.
12029431350.89
230417651511.00
31748621290.72
421410128380.74
51549221240.88
65220340.75
761299110.82
81106814240.58
91738228330.85
101386719330.58
1146524958690.84
121557122240.92
131079620320.63
1463359120.75
15625924300.80
16858433430.77
17797825420.60
18572359461150.40
191295713300.43
2043298130.62
2110410226380.68
222621991.00
23494931320.97
24424013170.76
251608633331.00
26148441.00
2743111.00
28797917290.59
382023206178550.72

Further, to use CURB 65, PSI, or procalcitonin to determine the ability to go home, one must assume that the act of hospitalizing the patient was not the very factor that led to the low observed death rate. We may find that the curious subgroup of patients with normal procalcitonin levels despite being PSI class IV and V are the very patients in whom hospitalization dramatically affected the outcome. Perhaps if one has increased procalcitonin level and PSI V, the proverbial cat is out of the bag—these patients are doomed to die regardless of hospitalization—whereas those with normal procalcitonin levels are the very ones who have a less virulent sepsis phenotype and benefit the most from intense care, hydration, and other resuscitative efforts. In the end, all-cause mortality at 30 days has limited clinical relevance when one is deciding which patients can be safely discharged home, thus limiting the usefulness of the study findings.

Answer 3 

return to Article Outline

Q3. The authors use likelihood ratios to describe their results (Table 3). What is a positive likelihood ratio (LR−)? A negative likelihood ratio (LR−)? Contrast likelihood ratios to other measures of test performance and describe their advantages and disadvantages. Why, in theory at least, are likelihood ratios particularly useful at the bedside? What does the negative likelihood ratio of 0.09 (95% confidence interval 0.02 to 0.36) found in this study mean quantitatively and qualitatively? What numbers is it based on (calculate it!)?

Sensitivity, specificity, positive predictive value, negative predictive value, LR+, and LR− are the most commonly reported measures of diagnostic test characteristics. These are all summary measures that attempt to reduce the classic 2×2 table (4 numbers) into some smaller number of numbers to increase ease of use. When reporting research, authors should provide the actual numbers of true positives, true negatives, false positives, and false negatives because each of the aforementioned summary measures can be derived from them, whereas the converse is not so.

Disease (+)Disease (−)
Test (+)True positives (a)False positives (b)
Test (−)False negatives (c)True negatives (d)

Sensitivity, specificity, and positive and negative predictive value have been discussed in a previous journal club.19

The bedside usefulness of sensitivity and specificity has been criticized because these measures describe the probability of a positive (negative) test result, given that the patient has (does not have) the disease, whereas clinicians want to know the probability of the disease, given a positive test result or the probability of the absence of disease, given a negative test result. These latter quantities are provided by the positive predictive value and negative predictive value, but their calculation requires the estimation of the pretest probability of disease, and therefore these quantities are not an inherent property of the test.

An alternative summary metric is the likelihood ratio. The likelihood ratio has the advantages of being a number that is independent of the pretest probability and easy to use in paperless calculations that can be done at the bedside. The LR+ is the ratio of the probability of a positive test result among patients with disease of interest to the probability of a positive test result in patients without the disease. Stated alternatively, LR+=true positive rate/false positive rate, which distills further to sensitivity/(1−specificity). Conversely, the LR− is the probability of a negative test result among those with disease over the probability of a negative test result among those without the disease→false negative rate/true negative rate→(1−sensitivity)/specificity. Thus, likelihood ratios incorporate both sensitivity and specificity but do not require an estimation of pretest probability. The strength of likelihood ratios is their mathematical properties that allow for the user to simply multiply the pretest odds of a patient having the disease by the LR+ of the test to yield posttest odds of having the disease, given that the test result was positive.

Readers should remember that the odds of disease are slightly different from probability of disease. That is, the probability (the number of ways an event can happen divided by the number of ways any event can happen) of rolling a 6 on a fair die is 1 in 6, or 16.7%, whereas the odds (the number of ways an event can happen divided by the number of ways it does not happen) of rolling a 6 is 1 over 5. Clinicians, unlike bookies, tend to think in terms of probabilities, so most will need to convert their pretest probability to pretest odds, using the formula odds=probability/(1−probability) and, after multiplying by the likelihood ratio, convert the posterior odds back to probability using probability=odds /(1+odds).

In Table 3 of the Huang et al study, the authors report the sensitivity and specificity of a 0.1 mg/mL cut point of procalcitonin for determining 30-day all-cause mortality in PSI IV and V patients as 0.98 and 0.27, respectively.

There is a small difference between the results we calculate and the numbers reported in the table (likely because of some rounding of the sensitivity and specificity estimates).

Now that we have these numbers, suppose you are caring for an elderly patient who you think has a 10% probability or 1:9 odds of developing a bad outcome. The procalcitonin test result is 0.23 ng/mL. What are the posttest odds and probabilities of a bad outcome? Posttest odds of having a bad outcome=pretest odds×LR+=1/9×1.34=1.34/9=0.149. (Posttest probability is 13% [0.149/(1+.149)].) In this case, the test minimally altered our belief as we went from a 10% probability to a 13% probability, hardly a helpful change. Had the LR+ been 5, we would have gone from 10% chance of a bad outcome to 36%; had the LR+ been 10, we would have gone from 10% to 53%; and had the LR been 20, we would have gone from 10% to 69%. It would take an LR+ of more than 81 to increase the posterior probability above 90%. Generally, an LR+ should be greater than 10 if the test is to have the ability to alter the pretests odds enough to have clinical importance.

Now consider the same case if the procalcitonin level had been 0.02 ng/mL (negative result).

In this example, we see that the LR− decreased the posttest probability by greater than 1 order of magnitude (10% to less than 1%), and thus a negative test result leads to a clinically important reduction in the posttest probability of a bad outcome. Generally, an LR− should be less than 0.1 to be truly effectual. In this case, the LR− point estimate is reported to be 0.09, with 95% CI 0.02 to 0.36. The point estimate is smaller than the 0.1 rule of thumb for clinical utility, but the high end of the CI is considerably higher. Therefore, although the test may be useful to rule out disease, it cannot be wholly endorsed at this time.

For those who prefer to avoid this math but have their pretest probability and the likelihood ratio handy, the Fagan nomogram can be used to derive a posttest probability (Figure 2).20 Consider the same scenario. The pretest probability is 10% and the LR+=1.34, whereas the LR− is 0.074. Mechanically, one draws a line from the pretest probability through the likelihood ratio of interest, and the continuation of this line intercepts the adjusted posttest probability. In our example, the dashed line corresponds to the change in probability associated with a positive test result, whereas the dotted line corresponds to the change in probability for a negative test result. With a copy of the nomogram in one's pocket and a list of likelihood ratios for common tests in one's pocket or PDA, the modern emergency physician can rapidly determine whether a test will likely produce clinically fruitful results.

Answer 4 

return to Article Outline

Q4. Multicenter studies make it possible to enroll large numbers of subjects and offer a greater chance for external generalizability but present analytic challenges. What are some of the analytic challenges that arise from multicenter studies and what are some of the techniques used to overcome these? Consider issues about the presentation of results and the statistical analysis of the data. In this study, what information about the role of individual study sites would help readers better understand and interpret the meaning of the results?

Increasingly, we are exposed to multi-site studies that offer the large number of subjects needed to demonstrate small effects and, perhaps more important, expand the generalizability of the study's findings.

Single-site studies almost always meet the following criticisms:


1)“Your patients are not like my patients. They are older/younger, sicker/healthier, and less or more likely to adhere to treatment than mine.”

2)“Your physicians are not like my physicians. Mine are smarter, taller, better at following guidelines.”

3)“Your institution is not like mine. Mine is bigger/smaller with more or fewer computed tomography scanners, pharmacists, case managers, etc.”

The point of these criticisms is that the outcomes observed at a single site may differ from those at another site, even if the apparent intervention is the same because of differences in site-specific particulars. These problems with generalizability can be mitigated by inclusion of a variety of types of sites in a multicenter study. The inclusion of additional sites has the added benefit of making it possible to recruit a larger number of subjects, thereby increasing the power of the study. On the other hand, the analysis of data emanating from multisite studies presents some complex and often-ignored challenges.

First, having recognized that site-specific characteristics play a part in the observed outcomes, investigators cannot analyze the data as if all patients came from the same site. One cannot lump all the observations together and use standard statistical tests. In fact, a key assumption of standard statistical tests is that the observations are independent, that information about one subject tells you nothing about any of the others. In a multicenter trial, observations are likely nonindependent; the knowledge that subject 1 came from center A may provide information about other patients treated at A. Hypothesis tests and confidence limits that do not account for the clustering of data according to site may provide biased results. This is particularly true when sites have unequal enrollment. What advantage does a multisite trial offer if 90% of the cases come from site 1 and 10% come from sites 2 to 15?

Regardless, readers should expect that authors will provide site-specific data and use analyses that account for the nonindependence of subjects treated at the same site. In the procalcitonin study, we can infer that there were differences in practice among sites by evaluating the data on admission rates across sites (Table 2).

In this table, we see very uneven admission rates for PSI classes I and II patients, with hospitals admitting from 40% to 100% of such patients. We cannot know why the admission rate varies so much from hospital to hospital, but the existence of this variance implies that there are differences in the patients or the physicians at individual hospitals not explained by the PSI score. Might these group differences also affect individual outcomes? We do not know whether mortality rates varied by institution or whether the mortality rate varied by the hospital admission rate. Further, we see that in this study there is uneven enrollment by site (which is usually the case). Should the 249 enrolled patients from site 11 have the same weight as the combined 266 enrolled patients from sites 6 to 10?

An important point to remember is that whenever clustering or nonindependence of observation occurs, point estimates calculated by standard techniques are more likely to be biased and, even if the point estimate is valid, the confidence limits may be too narrow. In general, CIs will be wider when observations are not independent, which may have profound effects on the conclusions that can be obtained from a study.

To account for the effects of clustering, researchers should use more robust statistical tools. In Stata, one can use the Huber-White clustering feature to provide more robust estimates of the confidence limits when data occur in clusters. Alternatively, and perhaps more important, multilevel modeling techniques are increasingly being used to determine group-level and individual-level effects on individual outcomes.

Interested readers should consult the references, and we hope to discuss an article that uses multilevel modeling in a journal club in the near future.21, 22

Answer 5 

return to Article Outline

Q5. What would you choose as the next step in evaluating the effect of procalcitonin testing on pneumonia patients? Should we start using it in clinical practice and see how we like it? Should we test it in an external validation set (how is this done)? Should we conduct a randomized controlled trial? How would you design such a trial? What would the intervention be? What would the outcome of interest be?

This may be the most important question one should ask after a thorough reading of any study. There never has been, nor ever will there ever be, a perfect clinical study. And, although it is a good intellectual exercise to dismantle the methodology and results of other people's work, one is still left with the question of what to do next. In the case of procalcitonin, we have identified several problematic areas that prohibit an endorsement of this test at present. These include a relatively weak theoretic basis on which to base a positive versus negative cut point, uncertainty surrounding the outcomes that would be the most clinically relevant, the ability of the hospitalization to affect the clinical outcome, and the precision of the LR− estimates. These negatives are counterbalanced by strengths of this study: previous research that is at least somewhat supportive of procalcitonin being able to identify low-risk patients (perhaps those who did not really have pneumonia), a well-defined methodology, an unambiguous outcome, and a point estimate for LR− that is below 0.1, the rule-of-thumb cut point for a useful test. It is not clear to what extent the PSI and CURB 65 influence clinical practice and completely unknown whether procalcitonin levels would affect practice. Consider that 13 of the 28 hospitals enrolled admitted greater than 80% of their pneumonia patients with PSI=1 or 2. In fact, 72% of all PSI 1 and 2 patients were admitted across all 28 hospitals (see above Table). These cases have already been theorized to have exceptionally low mortality, and still the majority of patients were admitted. If nearly three quarters of PSI I and II paitents are admitted, will a normal procalcitonin level give physicians courage to discharge PSI 5 patients? If it does, would it be false courage?

The manufacturer of the procalcitonin assay will likely market the product by encouraging physicians to use the test and see how they like it. This would be akin to the introduction of the β-type natruretic peptide test. Now, a decade later in many regions of the United States, β-type natruretic peptide has become incorporated into clinical practice but has yet to prove its merit as a diagnostic test worthy of altering an emergency physician's clinical judgment. Given the certainty of costs associated with the procalcitonin test and uncertainty about the test's true clinical utility, it seems unwise to simply adopt the test. In that vein, the authors correctly “recommend that procalcitonin as an adjunct to clinical tools be tested prospectively before wider use,” and “emphasize that procalcitonin level should never be used in isolation to make clinical decisions and does not replace physician assessment.” So what study design would be most feasible to determine the utility of procalcitonin as an adjunct to physician assessment?

A common strategy to determine the validity of a new clinical decision aid, whether it be a laboratory or clinical decision rule, is to validate it in a distinct population (external validation) from the one in which it was derived. This may be particularly important in the validation of procalcitonin because the method for determining that normal procalcitonin levels should be used only in PSI IV and V cases seems at least somewhat post hoc. Thus, examining procalcitonin levels in pneumonia patients from a different population could validate the finding that negative procalcitonin levels portend low 30-day mortality in higher PSI classes. Validating the findings of this study will likely require a costly prospective study of similar magnitude to the one just completed. But this may not be the best way to proceed. This approach does not address the criticism that 30-day all-cause mortality is not the most relevant outcome or whether the results of procalcitonin level should really alter clinical judgment.

Another option would be to draw blood for procalcitonin-level testing in the ED for all pneumonia patients and disclose the procalcitonin level randomly to the emergency physician. This randomized, controlled trial of the effect of procalcitonin on patient health would provide direct evidence of the benefit of the test. What should the primary outcome be for this randomized, controlled trial? Thirty-day mortality? For reasons discussed above, it seems unlikely that knowledge of the procalcitonin level would produce such drastic changes in treatment to effect a marked change in 30-day mortality. Should we measure admission rates or hospital length of stay? No doubt these are the intermediate outcomes for which procalcitonin level would be predicted to add to clinical judgment, and perhaps these can be used. However, without a measure of the patient's health status, such outcomes may be inadequate to define the utility of the test. Should we measure the health status of the patients at 3 days, 1 week, 2 weeks, or 30 days to determine whether those not admitted felt worse, took longer to recover at home, or conversely were very happy not to be hospitalized? Should costs be considered? How much is an extra day sick at home worth? These methodological issues illustrate how even a prospective, randomized trial will be subject to criticism regardless of the choices made. A trial with admission rate, return rate, hospital length of stay, and some form of short-term health status measures will give us the best chance of understanding the influence of procalcitonin on clinical practice.

References 

return to Article Outline

1. 1Huang DT, Weissfeld LA, Kellum JA, et al. Risk prediction with procalcitonin and clinical rules in community-acquired pneumonia. Ann Emerg Med. 2008;52:48–58. Full Text | Full-Text PDF (114 KB) | CrossRef

2. 2Schriger DL. Suggestions for improving the reporting of clinical research: the role of narrative. Ann Emerg Med. 2005;45:437–443. Abstract | Full Text | Full-Text PDF (296 KB) | CrossRef

3. 3Christ-Crain M, Muller B. Procalcitonin in bacterial infections—hype, hope, more or less?. Swiss Med Wkly. 2005;135:451–460. MEDLINE

4. 4Luzzani A, Polati E, Dorizzi R, et al. Comparison of procalcitonin and C-reactive protein as markers of sepsis. Crit Care Med. 2003;31:1737–1741. MEDLINE | CrossRef

5. 5Casado-Flores J, Blanco-Quiros A, Asensio J, et al. Serum procalcitonin in children with suspected sepsis: a comparison with C-reactive protein and neutrophil count. Pediatr Crit Care Med. 2003;4:190–195. MEDLINE | CrossRef

6. 6Caterino JM, Scheatzle MD, Forbes ML, et al. Bacteremic elder emergency department patients: procalcitonin and white count. Acad Emerg Med. 2004;11:393–396. MEDLINE | CrossRef

7. 7Uzzan B, Cohen R, Nicolas P, et al. Procalcitonin as a diagnostic test for sepsis in critically ill adults and after surgery or trauma: a systematic review and meta-analysis. Crit Care Med. 2006;34:1996–2003. MEDLINE | CrossRef

8. 8Christ-Crain M, Stolz D, Bingisser R, et al. Procalcitonin-guidance of antibiotic therapy in community-acquired pneumonia: a randomized trial. Am J Respir Crit Care Med. 2006;174:84–93. CrossRef

9. 9Stolz D, Christ-Crain M, Bingisser R, et al. Antibiotic treatment of exacerbations of COPD: a randomized, controlled trial comparing procalcitonin-guidance with standard therapy. Chest. 2007;131:9–19. MEDLINE | CrossRef

10. 10Christ-Crain M, Jaccard-Stolz D, Bingisser R, et al. Effect of procalcitonin-guided treatment on antibiotic use and outcome in lower respiratory tract infections: cluster-randomised, single-blinded intervention trial. Lancet. 2004;363:600–607. Abstract | Full Text | Full-Text PDF (108 KB) | CrossRef

11. 11Tang BM, Eslick GD, Craig JC, et al. Accuracy of procalcitonin for sepsis diagnosis in critically ill patients: systematic review and meta-analysis. Lancet Infect Dis. 2007;7:210–217. Abstract | Full Text | Full-Text PDF (180 KB) | CrossRef

12. 12Jones AE, Fiechtl JF, Brown MD, et al. Procalcitonin test in the diagnosis of bacteremia: a meta-analysis. Ann Emerg Med. 2007;50:34–41. Abstract | Full Text | Full-Text PDF (289 KB)

13. 13Masia M, Gutierrez F, Shum C, et al. Usefulness of procalcitonin levels in community-acquired pneumonia according to the Patients Outcome Research Team Pneumonia Severity Index. Chest. 2005;128:2223–2229. MEDLINE | CrossRef

14. 14Beovic B, Kreft S, Osredkar J, et al. Serum procalcitonin levels in patients with mild community-acquired pneumonia. Clin Microbiol Infect. 2005;11:1050–1051. CrossRef

15. 15Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology. 1999;10:37–48. MEDLINE

16. 16Lagakos SW. The challenge of subgroup analyses—reporting without distorting. N Engl J Med. 2006;354:1667–1669. CrossRef

17. 17Assmann SF, Pocock SJ, Enos LE, et al. Subgroup analysis and other (mis)uses of baseline data in clinical trials. Lancet. 2000;355:1064–1069. Abstract | Full Text | Full-Text PDF (87 KB) | CrossRef

18. 18Yealy DM, Auble TE, Stone RA, et al. The Emergency Department Community-Acquired Pneumonia Trial: methodology of a quality improvement intervention. Ann Emerg Med. 2004;43:770–782. Abstract | Full Text | Full-Text PDF (163 KB) | CrossRef

19. 19Barrett TW, Schriger DL. Practical considerations in HIV testing in the emergency department, characteristics of diagnostic tests, and the role of sensitivity analysis in observational studies: answers to March 2008 Journal Club questions. Ann Emerg Med. 2008;52:170–181. Full Text | Full-Text PDF (1420 KB) | CrossRef

20. 20Fagan TJ. Letter: nomogram for Bayes theorem. N Engl J Med. 1975;293:257. MEDLINE

21. 21Singer JD, Willett JB. Applied Longitudinal Data Analysis. Oxford, England: Oxford University Press; 2003;.

22. 22Rabe-Hesketh S, Skrondal A. Multilevel and Longitudinal Modeling Using Stata. 2nd ed.. College Station, TX: StataCorp; 2008;.

a University of California, Irvine, Irvine, CA

b University of California, Los Angeles, Los Angeles, CA

c Vanderbilt University Medical Center, Nashville, TN

 Editor's Note: You are reading answers to the fourth installment of Annals of Emergency Medicine Journal Club. The questions and the article they are about (Huang et al. Ann Emerg Med. 2008;52:48-58) were published in the July 2008 issue.1 We thank Dr. Huang and his colleagues for sharing additional data with us, which we used in answering question 4.

Information about journal club can be found at http://www.annemergmed.com/content/journalclub.

Readers should recognize that these are suggested answers. We hope they are accurate; we know that they are not comprehensive. There are many other points that could be made about these questions or about the article in general. Questions are rated “novice,” () “intermediate,” () and “advanced” () so that individuals planning a journal club can assign the right question to the right student. The “novice” rating does not imply that a novice should be able to spontaneously answer the question. “Novice” means we expect that someone with little background should be able to do a bit of reading, formulate an answer, and teach the material to others. Intermediate and advanced questions also will likely require some reading and research, and that reading will be sufficiently difficult that some background in clinical epidemiology will be helpful in understanding the reading and concepts.

We are interested in receiving feedback about this feature. Please e-mail journalclub@acep.org with your comments.

PII: S0196-0644(08)01487-X

doi:10.1016/j.annemergmed.2008.07.010


View previous. 27 of 39 View next.