Probability and diagnostic testing

In this page you will find some examples from the lecture. You are not required to do your exercises with R, however R can be used as a calculator.

Sensitivity = TP / P = TP / (TP+FN)
Specificity = TN / N = TN / (TN+FP)
Positive predictive value PPV = TP / test positive = TP / (TP + FP)

In addition,

false positive rate (FPR) is 1 - specificity
false negative rate (FNR) is 1 - sensitivity

When you have prevalence of the disease, the positive and negative predictive value PPV, NPV are

\[PPV = \frac{sens \times prev}{sens \times prev + (1-spec) \times (1 - prev)}\]

\[NPV = \frac{spec \times (1-prev)}{(1-sens) \times prev + spec \times (1-prev)}\]

Coin toss example

You can replicate the coin toss example in the probability section. Modify $n$ (number of tosses) and $p$ (probability of having 1 as outcome) to see what happens.

Mammography example

In the mammography example, we are given the counts of patients that fall into one of the four categories:

22 tested cancer really have cancer
331 tested no cancer really do not have cancer
16 tested cancer, but do not have cancer
3 tested no cancer, but really have cancer

According to the terminology introduced in class, we can create four variables to store the data.

tp <- 22
tn <- 331
fp <- 16
fn <- 3

We can find how many patients really have cancer, or not have cancer. Here positive and negative refer to the actual disease status, not thest test result.

# positive means positive condition - real disease status
positives <- tp + fn
positives

[1] 25

negatives <- tn + fp
negatives

[1] 347

Now find the sensitivity, specificity, positive predictive value.

# sensitivity: tp / positives
tp/positives

[1] 0.88

# specificity: tn / negatives
tn/negatives

[1] 0.9538905

# positive predictive value ppv
# tp / positive test
tp / (tp+fp)

[1] 0.5789474

HIV example

The HIV example from the class demonstrates the how prevalence affects metrics of diagnostic tests.

To start, we set prevalence to 0.001.

# prevalence 0.1%
prevalence <- 0.001

We were given some terms that are not sensitivity and specifity, but it is straightfoward to translate into terms we know.

# false positive rate 0.2%
# fpr is 1-specificity
specificity <- 1-0.002
specificity

[1] 0.998

# false negative rate 2%
# fnr is 1-sensitivity
sensitivity <- 1-0.02
sensitivity

[1] 0.98

Find positive predictive value from the formula.

a <- sensitivity * prevalence
b <- sensitivity * prevalence + (1-specificity) * (1-prevalence)
a/b

[1] 0.3290799

# you can skip the step where you name a and b
(sensitivity * prevalence) / (sensitivity * prevalence + (1-specificity) * (1-prevalence))

[1] 0.3290799

Now test out different values of prevalence. Replace the value of prevalence, then run the following code by pressing the button.

You can try the following prevalence values: 0.01%, 0.1%, 1%, 10%.

--- title: "Probability and diagnostic testing" engine: knitr format: html: code-fold: false code-tools: true editor: source filters: - webr webr: channel-type: "post-message" --- In this page you will find some examples from the lecture. You are not required to do your exercises with R, however R can be used as a calculator. * Sensitivity = TP / P = TP / (TP+FN) * Specificity = TN / N = TN / (TN+FP) * Positive predictive value PPV = TP / test positive = TP / (TP + FP) In addition, * false positive rate (FPR) is 1 - specificity * false negative rate (FNR) is 1 - sensitivity When you have prevalence of the disease, the positive and negative predictive value PPV, NPV are $$PPV = \frac{sens \times prev}{sens \times prev + (1-spec) \times (1 - prev)}$$ $$NPV = \frac{spec \times (1-prev)}{(1-sens) \times prev + spec \times (1-prev)}$$ # Coin toss example You can replicate the coin toss example in the probability section. Modify $n$ (number of tosses) and $p$ (probability of having 1 as outcome) to see what happens. ```{webr-r} #| label: coin #| warning: false #| echo: true # coin tosses: modify n to 50, 100 and run the code n <- 10 p <- 0.5 # prob of Heads (1s) s <- 1 coin <- rbinom(n = n, size = s, prob = p) coin # outcome, 1 is heads, 0 is tails # count how many 1s n_heads <- cumsum(coin) n_tosses <- 1:n # compute the cumulative probability of all first 1, 2, ..., n tosses p_cumulative <- n_heads/n_tosses p_cumulative <- round(p_cumulative, digits = 2) # take 2 digits # plot the probability plot(p_cumulative, ylim = c(0,1), pch = 20, xlab = 'Toss', ylab = 'Frequency', main = 'Coin tosses cumulative probability of H (1)') # highlight the 'true' probability abline(h = p, col = 'red') ``` # Mammography example In the mammography example, we are given the counts of patients that fall into one of the four categories: * 22 tested cancer really have cancer * 331 tested no cancer really do not have cancer * 16 tested cancer, but do not have cancer * 3 tested no cancer, but really have cancer According to the terminology introduced in class, we can create four variables to store the data. ```{r} #| label: mamm1 #| warning: false #| echo: true tp <- 22 tn <- 331 fp <- 16 fn <- 3 ``` We can find how many patients really have cancer, or not have cancer. Here **positive** and **negative** refer to the actual disease status, not thest test result. ```{r} #| label: mamm2 #| warning: false #| echo: true # positive means positive condition - real disease status positives <- tp + fn positives negatives <- tn + fp negatives ``` Now find the **sensitivity, specificity, positive predictive value**. ```{r} #| label: mamm3 #| warning: false #| echo: true # sensitivity: tp / positives tp/positives # specificity: tn / negatives tn/negatives # positive predictive value ppv # tp / positive test tp / (tp+fp) ``` # HIV example The HIV example from the class demonstrates the how prevalence affects metrics of diagnostic tests. To start, we set prevalence to 0.001. ```{r} #| label: hiv1 #| warning: false #| echo: true # prevalence 0.1% prevalence <- 0.001 ``` We were given some terms that are not sensitivity and specifity, but it is straightfoward to translate into terms we know. ```{r} #| label: hiv2 #| warning: false #| echo: true # false positive rate 0.2% # fpr is 1-specificity specificity <- 1-0.002 specificity # false negative rate 2% # fnr is 1-sensitivity sensitivity <- 1-0.02 sensitivity ``` Find positive predictive value from the formula. ```{r} #| label: hiv3 #| warning: false #| echo: true a <- sensitivity * prevalence b <- sensitivity * prevalence + (1-specificity) * (1-prevalence) a/b # you can skip the step where you name a and b (sensitivity * prevalence) / (sensitivity * prevalence + (1-specificity) * (1-prevalence)) ``` Now test out different values of prevalence. Replace the value of prevalence, then run the following code by pressing the button. You can try the following prevalence values: 0.01%, 0.1%, 1%, 10%. ```{webr-r} #| label: prev #| warning: false #| echo: true # delete and replace the value: prevalence <- 0.01 # other two metrics are kept as before specificity <- 0.998 sensitivity <- 0.98 # run the formula for ppv (sensitivity * prevalence) / (sensitivity * prevalence + (1-specificity) * (1-prevalence)) # npv (specificity * (1-prevalence)) / ((1-sensitivity) * prevalence + specificity * (1-prevalence)) ```