Probability
Topics:
- Basic probability concepts
- Probability calculations
- Conditional and total probability
- Stochastic independence
- Bayes’ theorem
Books and resources:
- Aalen 3.1-3.10 (in Norwegian)
- Kirkwood and Sterne 14 (in English)
Basic concepts
A probability expresses a potential for something to happen.
It is an assessment of uncertainty in a situation or event.
It corresponds to the concept of risk in medicine.
Brief history
Blaise Pascal (17th century) was the founder of probability theory, the set of basic rules for doing probability calculations. His work was motivated by dice and card games.
Andrey Kolmogorov formulated the exact probability rules as late as 1933.
Two definitions of probability
Frequentist definition: Proportions of times (or frequency) that some event occurs in a large number of similar repeated trials.
Bayesian definition: Degree of belief in the occurrence of an event.
Law of Large Numbers
“As an experiment is repeated over and over, the observed frequency approaches the true probability”.
The frequentist view of probability interprets the frequency of an event in a large number of experiments as its probability.
Probability calculations
Stochastic trial, events and sample space.
A stochastic trial is characterized by an uncertain outcome.
All possible outcomes in a stochastic trial make up the sample space.
An event can be a single outcome, or a collection of single outcomes.
Each event has a probability of ocurrence between 0 and 1. A probability equal to 0 means that the event can never occur, and equal to 1 means that the event is certeinly occuring.
The sum of all probabilities in a sample space equals 1.
Venn diagram
Venn diagrams are often used to illustrate events. In the figures below A and B represent different events and S is the sample space.
Operators on events:
Union: \(A \cup B\)
Intersection: \(A \cap B\)
Complement: \(\bar{A}\)
Union: \(A \cup B\) | Intersection: \(A \cap B\) |
Complement: \(\bar{A}\) | Combining operators: \(A \cap \bar{B}\) |
Probability calculation rules
The probability of an event \(A\) is denoted by \(P(A)\). It has a value between 0 and 1. The probability over the whole sample space equals 1.
Complement rule
\[P(A) + P(\bar{A}) = 1\]
Additive rule
The occurrence of at least one of the events \(A\) or \(B\) is \[P(A \cup B) = P(A) + P(B) - P(A \cap B)\] For disjoint events \(A\) and \(B\), \(P(A \cap B) = 0\). Hence \[P(A \cup B) = P(A) + P(B)\]
Multiplicative rule
Probability of independent events \(A\) and \(B\) can be multiplied
\[P(A \cap B) = P(A) \times P(B)\]
Conditional probability
What is the probability of getting the outcome \(A\) given that the event \(B\) has occur? For example, what is the risk of becoming sick from COVID-19 given that your spouse already is?
The idea to define such a conditional probability of \(A\) given \(B\), denoted \(P(A|B)\), is to consider \(B\) as the new sample space and rescale the probability of events in \(B\), such that the new sample space has probability 1:
\[P(A|B) = \frac{P(A \cap B)}{P(B)}\]
Stochastic independence
The events \(A\) and \(B\) are independent if \(P(A|B) = P(A)\)
Interpretation: probability of \(A\) is the same if we also know that \(B\) has occurred.
Probability calculations can be simplified if there is stochastic independence:
\[P(A|B) = \frac{P(A \cap B)}{P(B)} = P(A)\]
\[P(A \cap B) = P(A) \times P(B)\]
Total probability
The law of total probability expresses the probability of an outcome, \(P(A)\), which can be realized via two distinct events \(B\) and \(\bar{B}\):
\[P(A) = P(A|B)P(B) + P(A| \bar{B})P( \bar{B})\]
Bayes’ theorem (also called Bayes’ law)
Given two events \(A\) and \(B\), Bayes theorem states that:
\[ P(B|A)=\frac{P(A|B)P(B)}{P(A)} \]
Bayesian statistics
In the Bayesian definition of probability, the size of the probability of a given event represents ones degree of belief in the occurrence of the even.
Where does Bayes come in? Bayes’ law is used to calculate such probabilities base on on our prior belief and available data:
\[ P(\theta|data) =\frac{P(data|\theta)P(\theta)}{P(data)} \]
- \(\theta\) refers to the parameters in your model (mean, variance, etc.)
- The prior distribution \(P(\theta)\) is where you put in your prior beliefs
- What you want to estimte is the so called posterior distribution \(P(\theta|data)\), the probability distribution of the model parameters given your data.
- The more data you have, the more will it dominate over your prior belief.
- When youy have prior knowledge about your problem, you get to actually use this information
- When you know little (or nothing) of your problme, there are anyway many methodological advantages in using Bayesian statistics
- Bayesian statistics is not really relevant for simpler problems like in this course, but you will probaly at some point come across articles using Bayesian approaches.