Send Close Add comments: (status displays here)
Got it!  This site "robinsnyder.com" uses cookies. You consent to this by clicking on "Got it!" or by continuing to use this website.  Note: This appears on each machine/browser from which this site is accessed.
Bayes Rule: Cancer testing
by RS  admin@robinsnyder.com : 1024 x 640


1. Bayes Rule: Cancer testing
Bayes' rule is named after Thomas Bayes (minister, mathematician) who established a mathematical basis for probability inference. Bayesian logic is a foundation of many decision procedures used in the area of artificial intelligence and intelligent systems.

2. Pertinent ethical question
Why should a business keep tests that may have stigmas attached to them confidential? For example, tests such as drug tests, AIDS tests, cancer tests, etc.

3. Quantitative reasoning
Instead of stating opinions, let us quantify the problem and see what qualitative results in the form of general guidance follow from the quantitative analysis.

4. The problem
Assume that the occurrence of cancer in the general population is 0.4%. You take a test for cancer. The test is 98.0% accurate in predicting cancer. The test results are positive. Given that this is the only information available, what is the probability that you have cancer?

Adapted from Paulos, J. (1988). Innumeracy: Mathematical illiteracy and its consequences. New York: Hill and Wang (pp. 89-90).

5. Decision tree
Tails-tailsThat is, you take a test for cancer. The test results are positive. This is a Tp or true-positive.

What assumptions are we making?

6. Assumptions
We are assuming that the information given is the only information available about the situation.

In real life, a doctor would use other additional information to make a diagnosis. Let us quantify the cancer test problem using Bayes' rule and use sensitivity analysis to analyze the solution.

7. Cancer and 2-valued logic
Everyone either has cancer or does not have cancer. (two-valued logical assumption) Cn and Cy Events Cy and Cn are mutually exclusive and collectively exhaustive.

This is two-valued logic, as opposed to multi-valued (i.e., fuzzy) logic.

8. Cancer properties
What do we know about cancer probabilities? Question: Do we know P(Cy) and do we know P(Cn)?

9. Cancer probabilities
Yes, we know that P(Cy) is 0.004 and that P(Cn) is 0.996. Common error: converting 0.4% to 0.04, not 0.004.

Note: The size of the sets in a Venn diagram do not represent their relative weight.

10. Tests and 2-valued logic
A test T is either positive (cancer) or negative (free of cancer) Tn or Tp Events Tp and Tn are mutually exclusive and collectively exhaustive.

11. Test properties
We are making certain assumptions. Question: Do we know P(Tp)? No, we only know that the probability that a person tests positive for cancer given that they have cancer is 0.98, or 98.0%. This is the conditional probability P(Tp | Cy) and not the probability P(Tp).

This is what we want to determine/calculate!

12. Conditional probability
Prob(A | B) is the probability that A is true given that B is true. In real life, not all tests performance results are symmetric.

13. Predictions and errors
The prediction is the result of a test T for cancer.

The result can be true or false.

14. Error categories

15. Cancer test predictions
Tails-tails

16. Errors
True-positives and true-negatives are not logical problems, although in this case a true-positive is a tremendous personal problem or tragedy.

False-positives and false-negatives are logical problems and can also be tremendous personal problems.

Question: Why can a false-negative result be a problem? A false-negative result means that the test indicates that you do not have cancer, but you do have cancer.

Question: Why can a false-positive result be a problem?

17. Venn diagram
A false-positive result means that the test indicates that you do have cancer, but you do not have cancer. Venn diagram for cancer testHere is the Venn diagram showing the four possibilities.

18. Venn diagram
Venn diagram Cn Cy Tn Ty

19. Conditional probability
We have P(Tp | Cy). We want P(Cy | Tp).

In general, P( Cy | Tp ) is not the same as P( Tp | Cy ).

The following are not the same.

20. Spanish and Spain
The probability that someone speaks Spanish
given that they live in Spain
is very high. Tails-tailsThe probability that someone lives in Spain
given that they speak Spanish
is very low.

21. The original problem
The probability that one has cancer, given that one has tested positive for cancer is You are given P(Tp | Cy). What is P(Cy | Tp)? Cancer Graphical ModelsQuestion: Can we determine P(Cy | Tp)?

22. Algebra
Let us expand our definitions and see what happens.

We have: Question: Can we get it?

23. More algebra
Note the common denominator P(Tp Cy) on the right hand side. Let us move terms around in order to remove this denominator.

P(Tp | Cy) = 0.98 = P(Tp Cy) / P(Tp) Question: But what is P(Tp)?

24. Truth table tautology
We do not know, but we can use a truth table tautology to indirectly determine P(Tp). Truth tableNote: Cn = ¬ Cy
Cy Tp | Tp = ((Cy & Tn) | ((¬ Cy) & Tp)) ----------------------------------------------- 0 0 | 0 1 ((0 0 0 ) 0 (( 1 0 ) 0 0 )) 0 1 | 1 1 ((0 0 1 ) 1 (( 1 0 ) 1 1 )) 1 0 | 0 1 ((1 0 0 ) 0 (( 0 1 ) 0 0 )) 1 1 | 1 1 ((1 1 1 ) 1 (( 0 1 ) 0 1 ))


25. Algebra
P(Tp) Question: How do we find the relevant intersections?

26. Conditional probability rule
By using the conditional probability rule. Multiplying both sides by P( Cy ): Reverse the equality:

27. Intersection probabilities
We can now calculate the intersection probabilities:

28. Intersection probabilities
Bayes Rule gridP(Cn Tn) = 0.02 * 0.996 ; P(Cy Tn) = 0.02 * 0.004

P(Cn Tp) = 0.98 * 0.996 ; P(Cy Tp) = 0.98 * 0.004

Question: Which Intersection probabilities do we need?

29. Bayes' Rule development
Since the test is positive, we need P(Cy) P(Tn) and we need P(Cy) P(Tp). Bayes Rule formulaNow expand the terms. Bayes Rule formulaNow simplify. Bayes Rule formulaIs this simple enough?

Question: Why is the last formula the most commonly used one?

30. Bayes Rule
Because it is the most impressive. Also, in the real world, people are more comfortable estimating conditional probabilities than intersection probabilities.

We have a simpler form of Bayes Rule. Bayes Rule formula Bayes rule Venn diagram Bayes Rule formula

31. Direct calculation
Bayes Rule formula

P(Cy |Tp) It is not 98.0%!

32. Epidemiology
Well-known result: This happens when

33. Medical diagnoses
We assumed that the information given was the only information available.

In most medical diagnoses, there is often additional (nonindependent) information with which to make diagnostic decisions.

34. Ethical inferences
Question: Why should a business keep tests that may have stigmas attached to them confidential. For example, tests such as drug tests, AIDS tests, cancer tests, etc. A business should keep such tests confidential because the results must be interpreted appropriately. In this case, the probability that the person has cancer given that they tested positive for cancer is not 98.0% but 16.7%.

Question: What is the appropriate action to take when someone tests positive on a test that has a stigma attached to it. 1. Do additional testing. 2. Keep the information confidential.

Do such situations occur in real life?

35. Real life results
What does this mean? Our results were for a test with an accuracy of 0.98.

36. Stacked bar chart
Stacked bar chart - cancer testA stacked bar chart displays the before and after results of a positive test for cancer with a test accuracy of 0.98.

37. Usefulness of a test
Notice that a positive test for cancer has increased the probability from 0.004 to 0.167, an increase in probability by a factor of 42 times.

A positive test for cancer dramatically increases the probability that the patient does have cancer, but the probability is not 98.0%, which would be an increase of 24,500 times.

38. Sensitivity analysis
A sensitivity analysis answers the question, "What happens if input parameters of the problem are varied?".

We can perform a sensitivity analysis of this problem by asking the question, "What happens if the accuracy of the test varies?".

39. Generalizing the result
To do a sensitivity analysis, we need to generalize our result for a test accuracy of 98.0% to handle a range of accuracies. For example, 90.0% to 100.0%.

40. Specific result
Specific result: Bayes Rule formulaP(Cy | Tp) = (0.98 * 0.004) / ((0.98 * 0.004) + (0.02 * 0.996))

41. General result
P(Cy | Tp) Bayes Rule formula Bayes Rule formula

42. Sensitivity analysis
Bayes rule sensitivity analysisThe probability increases to 100.0% only as the test accuracy approaches 100.0%.

Question: What happens as the accuracy of the test decreases from 100.0%? The probability that one has cancer given that the test is positive decreases exponentially.

How do these results apply to lie detector tests where the accuracy of the test is much less?

Note: This analysis does not include factors such as the probability that the lab made a mistake. Factors such as these are important in DNA evidence used in legal court cases.

43. Bayesian networks
In general, Bayes' Rule allows information of the form to be converted to the form

44. Expert systems applications

45. End of page

by RS  admin@robinsnyder.com : 1024 x 640