Annals of Emergency Medicine
Volume 45, Issue 3 , Pages 291-294, March 2005

Placing the Bayesian Network Approach to Patient Diagnosis in Perspective

  • Ari M. Lipsky, MD

      Affiliations

    • Corresponding Author InformationAddress for correspondence: Ari M. Lipsky, MD, Department of Emergency Medicine, 1000 W. Carson Street, Box 21, Torrance, CA 90509; 310-222-3500, fax 310-782-1763
  • ,
  • Roger J. Lewis, MD, PhD

From the Department of Emergency Medicine at Harbor–UCLA Medical Center, Torrance, CA (Lipsky, Lewis), and the Department of Medicine at the David Geffen School of Medicine at University of California–Los Angeles, Los Angeles, CA (Lewis)

published online 18 January 2005.

SEE RELATED ARTICLE, P. 282.

Article Outline

 

In this month's issue of Annals, Kline et al1 report the creation and validation of a Bayesian network designed to identify a low-risk subset of patients suspected of having a venous thromboembolism, using only readily available clinical data. During validation, they found that half of their patients were categorized as low risk by the network and that 98.5% (95% confidence interval [CI] 97.2% to 99.2%) of these patients were determined to be free of disease. The clinical significance of this work is readily apparent: if data from our initial assessment (ie, history, physical examination, pulse oximetry) can comfortably exclude venous thromboembolism in half of the patients in whom we consider the diagnosis, then the potential savings to the patient and the health care system may be quite substantial. We would be able to reserve the use of d-dimer tests and lower extremity and pulmonary imaging studies—which have their own inherent costs in terms of time, pain, associated risks, and potential false positive results—for patients not in this low-risk category.

In addition to its potential clinical utility, this research is noteworthy for the authors' implementation of a Bayesian network, an important step forward in the application of Bayesian thinking to the clinical practice of medicine. In this editorial, we describe how evaluations of diagnostic tools have become more “Bayesian” over time, where the Bayesian network approach fits in this evolution, and how the Bayesian network captures much of the essence of clinical reasoning.

Back to Article Outline

Frequentists' failings 

The first use of statistical methods to quantify the utility of diagnostic tests—and clinical evaluation tools more generally—relied on classical or frequentist P values. Specifically, a diagnostic test was used to divide patients into 2 groups (eg, those with positive versus those with negative results), and a P value was used to demonstrate a difference in disease prevalence between the groups. Unfortunately, the P value provides little information regarding the magnitude of that difference, and it is therefore often useless to the clinician who is trying to estimate the probability of disease on the basis of the test result.2

Ranson's criteria for the risk stratification of patients with acute pancreatitis, as an example, were supported statistically using P values.3 The clinician trying to understand the clinical relevance of positive Ranson's criteria, however, can only conclude that a patient who meets high-risk criteria is likely worse off than a patient who does not meet the criteria. There is a dissonance between the yes-no information available to the clinician and the clinician's natural approach to decisionmaking, which requires addressing “how much worse” or “how much more likely.”

Back to Article Outline

The next step: sensitivity and specificity 

The reporting of sensitivity and specificity brings us a little closer to our goal of the useful quantification of diagnostic test information. Sensitivity is the fraction of patients with a disease for whom a diagnostic test is positive; it measures a test's ability to detect disease. Specificity is the fraction of patients without a disease for whom a diagnostic test is negative; it measures a test's ability to rule out disease.

Although these measures provide more meaningful information than P values about the utility of a specific test or set of clinical criteria, they have important shortcomings.4 Sensitivity and specificity are calculated from populations that have already been separated into disease and nondisease states (ie, sensitivity includes only those patients with disease, and specificity includes only those patients without disease). However, the clinician orders the test precisely to determine whether the patient has disease. Sensitivity and specificity tell us the probability of a test result given the presence or absence of disease, when what we really want to know is the reverse: given a test result, what is the probability of disease.

Back to Article Outline

Toward prediction 

The next conceptual step in the evaluation of diagnostic strategies is the predictive value. A positive predictive value is the fraction of patients with a positive test who actually have the disease, and a negative predictive value is the fraction of patients with a negative test who are truly disease free. Predictive values enable clinicians to interpret test results in a more clinically meaningful way: the result of a test can be used to determine the probability of disease.

The predictive value, however, combines both test-specific characteristics and disease prevalence.4 The inseparability of these 2 components means that any particular predictive value for a test applies only to a population with similar prevalence or pretest probability. It also means that we cannot use successive tests to further refine our estimation of the probability of disease, because a specific pretest estimation (which ought to change depending on the available information) has been incorporated into the predictive value. What we desire is a measure of test “strength” that allows us to postulate our own pretest estimation of disease and then revise our estimation of disease using the test result.

Back to Article Outline

The Bayesian world 

As clinicians, we instinctively make estimations about the probability or risk that a patient has a particular disease. Then, as we assimilate more information (eg, additional history or physical examination findings, laboratory results, or consultants' opinions), we modify our risk-estimates until we think that we have reasonably excluded—or ruled in—the disease.

For instance, consider the evaluation of a patient presenting with cough and rales. The initial differential may include pneumonia and congestive heart failure, and we may initially believe that they are equally likely (ie, 50% and 50%). A chest radiograph is performed that shows increased vascular markings and possibly some Kerley B lines. With this new information, the probabilities we initially assigned to pneumonia and congestive heart failure as causes should be updated; they might now be 20% and 80%, respectively. And perhaps a few minutes later, a normal rectal temperature is obtained, leading us to further increase our estimate of the probability of congestive heart failure. This entire process, even if not explicitly quantified, is integral to the clinical decisionmaking process.

The modification of our initial beliefs to reflect the accumulating data is the essence of a theorem likely first postulated by Thomas Bayes in the 18th century.5 This theorem enables us to answer the key question, “How much more likely?”6

For either a positive or negative test result, Bayes' theorem can be written:

where P(D|T) is the revised or posterior probability of the disease D given the test result T; P(T|D) is the probability of the test result given the disease D; P(D) is the initial or prior probability of the disease D; P(T|ND) is the probability of the test result when there is no disease present, denoted ND; and P(ND) is the prior probability of no disease.

The prior probabilities, P(D) and P(ND), reflect our intuition and all the information available to us before we obtain the test. In the absence of any other information, we may decide that they should be equal to the general prevalence of disease (P(D)) and its complement (P(ND)). When we decide to conduct an additional test, the posterior probability obtained after the first test becomes the prior probability for evaluating the successive test. This is exactly the process of updating our beliefs given accumulating information.

Back to Article Outline

Likelihood ratios 

The conditional probabilities, P(T|D) and P(T|ND), when used in Bayes' theorem, are called likelihood values—they measure the likelihood of a certain test outcome given the presence or absence of disease. A ratio may be formed by dividing the likelihood of a positive test result in patients with disease (ie, sensitivity) by the likelihood of a positive test result in patients without disease (ie, 1−specificity). This ratio is termed the likelihood ratio for a positive test (LR+). The LR+, which should be greater than 1, measures how much more likely it is that someone with disease will have a positive test compared with someone without disease.

Conversely, the negative LR, or LR, is the likelihood of a negative test result in patients with disease (ie, 1−sensitivity) divided by the likelihood of a negative test result in patients without disease (ie, specificity). The LR, which should be less than 1, tells us how much less likely it is that someone with disease will test negative compared with someone without disease.

The LR for a test result enables us to update our estimate of the probability of disease.4, 7 To use the LR, we convert the pretest probability of disease into pretest odds, multiply by the LR for the test result, and then convert the posttest odds back into posttest probabilities (for details, see reference 4). As an example, suppose we believe a patient with pancreatitis has a prior probability of bad outcome of 9% (odds 0.10). Using Ranson's original data, we can calculate that the presence of 3 or more criteria has a LR+ of 44, while the presence of fewer than 3 criteria has a LR of 0.36.3 If our patient has 3 or more positive criteria, then his revised probability of bad outcome is 81% (odds 0.10×44=4.4). If he has fewer than 3 criteria, then the revised probability is 3.4% (odds 0.10×0.36=0.04). In the former case, we start looking for an ICU bed; in the latter, we might be comfortable with a ward admission. In either case, the LRs enable us to combine our prior estimate of the probability of poor outcome with new clinical information, which would be impossible with only a P value.

Back to Article Outline

Tackling real world complexity: Bayesian networks 

The application of Bayesian thinking to an individual diagnostic “test,” whether an element of the history or physical examination, a laboratory test, an imaging study, or a consultant's opinion, can be summarized as follows: we modify our initial or prior suspicion of disease using the likelihood for the result obtained to produce our new or posterior suspicion of disease. In other words, we draw conclusions about the cause (ie, the disease) on the basis of information available to us about the effect (ie, the test result).

To make a graph of this process for pneumonia (disease) and chest radiograph (test), we draw a simple PNA→CXR diagram, where PNA represents the presence of pneumonia, which in a high-risk, symptomatic emergency department population may have a probability of 20%, and CXR represents the result of the chest radiograph. CXR is said to be conditionally dependent on PNA, meaning that the probability of CXR being positive depends on whether PNA is positive. In other words, the presence or absence of pneumonia influences the findings on the chest radiograph. We might assume that if PNA is positive (pneumonia is present), CXR is positive (infiltrate on chest film) 90% of the time; if PNA is negative (no pneumonia), CXR is still positive 5% of the time because of some other cause. “Reading” the diagram forward (ie, in the direction of causality, from PNA to CXR), we could say that “in those 20% of patients who have pneumonia, the chest film is positive 90% of the time, and in the 80% of patients who do not have pneumonia the chest film is still positive 5% of the time.”

We can also “read” the diagram backward; namely, we can determine the probability of PNA given a value for CXR. This is the more clinically useful, Bayesian-style question: “In patients with a positive chest radiograph, what is the probability of pneumonia?” Applying Bayes' theorem and using the probabilities specified above in the diagram, we can answer this question (the result is 82%).

We can extend our diagram by adding another variable—RR, “respiratory rate >20”—which is also conditionally dependent on the presence of pneumonia. Our diagram would look like this: CXR←PNA→RR. Perhaps RR (tachypnea) is positive 80% of the time when PNA is positive, and only 10% of the time when PNA is not positive. If we know the values of CXR and RR, we can once again use Bayes' theorem to find the value we are actually interested in—the probability of pneumonia. Interestingly, if we knew that CXR was positive but had no information about RR, our probability of PNA being positive would appropriately increase (as in the PNA→CXR example), but so would the probability of RR being positive. This is because whether RR is positive depends on PNA: the more likely it is that PNA is positive, the more likely it is that RR is positive as well.

We have just created a simple Bayesian network.8 A Bayesian network consists of nodes (representing variables) and lines (also called arcs or edges) connecting the nodes. The network is directed, meaning that each line is unidirectional, so that the node on one side of a line is seen as the predecessor of the node on the other side (“cause precedes effect”). The second node's value is said to be conditionally dependent on the first node, and not the other way around. This may, for instance, represent a cause and effect such as pneumonia and chest radiograph changes. A Bayesian network is also acyclic, which means that tracing the flow from one variable around the network (along the direction of the lines) will never bring you back to the original variable.

The full specification of a Bayesian network requires encoding 2 elements. First, we need to know the structure of the network: which individual nodes are connected and in what direction. Note that each node can be connected to multiple other nodes. Second, we need to specify the conditional probabilities for each individual node. For instance, we may encode for the chest film node that it is positive in 90% of patients who have pneumonia and in 5% of the patients who do not have pneumonia. Nodes that only have lines leading from them (eg, PNA), so-called roots, only require specifying their baseline probabilities of being positive. All nodes that receive lines (eg, CXR and RR) require that we encode the conditional probabilities with respect to each of their immediate predecessors (“parents”).

Where do the structure and probabilities come from? 

The structure of the network may be based on our beliefs about causation, for instance, that pneumonia is the cause and a positive chest radiograph the effect. It may also be selected by a computer algorithm, with the utility of potential networks compared against a benchmark. In the article by Kline et al,1 multiple versions of the network structure were generated and another algorithm was used to select the best one. While we might like to believe we can determine causality by examining the links between the nodes, the ideal network may not reflect our beliefs.9 The most important criterion for selecting a network is that it produces the correct answer with a high probability, not that it reflects our personal belief structure.

The probabilities for the individual nodes can also be encoded directly on the basis of available information or expert opinion. Alternatively, the probabilities can be learned from a derivation data set, as they were in Kline et al's1 study.

What are the advantages of Bayesian networks? 

Bayesian networks are powerful tools that apply intuitive reasoning methods to multiple variables simultaneously. The structure that results, unlike a neural network, is interpretable: we can understand how the network processes a patient's data, even if we cannot manage the calculation of the desired, final probability without a computer. Bayesian networks also avoid many of the overfitting issues present in other data mining techniques.10

Perhaps the most important advantage of the Bayesian network is its ability to handle missing data by incorporating the most likely value(s) of the missing variable(s) based on the values of all the non-missing variables. Examining Figure A1,1 we realize that “VTE +” is treated just like any other missing variable: its value is inferred from all the other available data and the known conditional probabilities.

Back to Article Outline

The future 

The statistical analysis of diagnostic tests and clinical risk stratification strategies has grown remarkably in both sophistication and relevance. Where LRs are useful tools in understanding the importance of specific test results, we are now capable of using networks of test results and other clinical data in a quantitative manner to assist us in diagnosis. Their further investigation and incorporation into clinical care will likely benefit both the patient and the clinician.

Back to Article Outline

References 

  1. Kline JA, Novobilskl AJ, Kabrhel C, et al. Derivation and validation of a Bayesian network to predict pretest probability of venous thromboembolism. Ann Emerg Med. 2005;45:282–290
  2. Goodman SN. Toward evidence-based medical statistics. 1: The P value fallacy. Ann Intern Med. 1999;130:995–1004
  3. Ranson JH, Rifkind KM, Roses DF, et al. Prognostic signs and the role of operative management in acute pancreatitis. Surg Gynecol Obstet. 1974;139:69–81
  4. Gallagher EJ. Clinical utility of likelihood ratios. Ann Emerg Med. 1998;31:391–397
  5. Bayes T. An essay towards solving a problem in the doctrine of chances. Philos Trans R Soc Lond. 1763;53:370–418
  6. Lewis RJ, Wears RL. An introduction to the Bayesian analysis of clinical trials. Ann Emerg Med. 1993;22:1328–1336
  7. Gallagher EJ. The problem with sensitivity and specificity…. Ann Emerg Med. 2003;42:298–303
  8. Charniak E. Bayesian networks without tears. AI Magazine. 1991;12:50–63
  9. Lucas PJ, van der Gaag LC, Abu-Hanna A. Bayesian networks in biomedicine and health-care. Artif Intell Med. 2004;30:201–214
  10. Heckerman D. A tutorial on learning with Bayesian networks. Technical report MSR-TR-95-06. Microsoft Research, March 1995. (Revised November 1996.) Available at: ftp://ftp.research.microsoft.com/pub/tr/tr-95-06.pdf. Accessed October 3, 2004.

 Funding and support: The authors report this study did not receive any outside funding or support.Reprints are not available from the authors.

PII: S0196-0644(04)01499-4

doi:10.1016/j.annemergmed.2004.10.006

Refers to article:

  • Derivation and Validation of a Bayesian Network to Predict Pretest Probability of Venous Thromboembolism , 20 January 2005

    Jeffrey A. Kline, Andrew J. Novobilski, Christopher Kabrhel, Peter B. Richman, D. Mark Courtney
    Annals of Emergency Medicine March 2005 (Vol. 45, Issue 3, Pages 282-290)

Annals of Emergency Medicine
Volume 45, Issue 3 , Pages 291-294, March 2005