Bayesian Methods
The Bayes formula will be invaluable throughout the analysis of Bayesian methods in Machine Learning. Suppose we observe $N$ data points jointly represented by $X$ and we know that they come from a distribution with parameters $\theta$, then
\begin{align}
P(\theta \vert X) = \frac{P(X \vert \theta)P(\theta)}{P(X)}\end{align}
where
- $P(\theta \vert X)$ is called the posterior, the probability distribution of $\theta$ having observed the data
- $P(X \vert \theta)$ is called the likelihood, the probability of occurence of the data under a given $\theta$
- $P(\theta)$ is called the prior, our prior beliefs about the parameters without seeing the data
- $P(X)$ is the probability distribution of the data which is fixed for a given data set