Tags: ,

# Bayes’ theorem

Bayes’ theorem is formulated as \begin{equation} P(A|B) =\frac{P(A\cap B)}{P(B)} =\frac{\frac{P(A\cap B)}{P(A)}P(A)}{P(B)} =\frac{P(B|A)}{P(B)}P(A) \end{equation} where $P(\cdot)$ is the probability of a variable.

It is typically interpreted as \begin{equation} P(H|E) =\frac{P(E|H)}{P(E)}P(H) =\frac{L(H|E)}{P(E)}P(H) \propto L(H|E)P(H) \end{equation} where $H$ is a hypothesis, $E$ is evidence, $P(E|H)$ or $L(H|E)$ is the likelihood, $P(H)$ is the prior, and $P(H|E)$ is the posterior.

## Empirical Bayes method

Let $p(\cdot)$ be probability mass functions. Given observed data $y\sim p(y|\theta)$, parameters $\theta\sim p(\theta|\eta)$, and hyperparameters $\eta\sim p(\eta)$, we can obtain \begin{equation} p(\theta|y) =\frac{p(y|\theta)}{p(y)}p(\theta) =\frac{p(y|\theta)}{p(y)}\int p(\theta|\eta)p(\eta)\,d\eta \label{eq:bayes_1} \end{equation} because marginalizing out $\eta$ from the integral yields \begin{equation} \int p(\theta|\eta)p(\eta)\,d\eta =\int\frac{p(\theta,\eta)}{p(\eta)}p(\eta)\,d\eta =\int p(\theta,\eta)\,d\eta =p(\theta). \end{equation}

Alternatively, \begin{equation} p(\theta|y) =\int\frac{p(\theta,\eta,y)}{p(y)}\,d\eta =\int\frac{p(\theta,\eta,y)}{p(\eta,y)}\frac{p(\eta,y)}{p(y)}\,d\eta =\int p(\theta|\eta,y)p(\eta|y)\,d\eta \end{equation} and from Eq. \eqref{eq:bayes_1}, \begin{equation} \begin{aligned} p(\theta|y) &=\frac{p(y|\theta)}{p(y)}\int p(\theta|\eta)p(\eta)\,d\eta\\ &=\int\frac{p(y|\theta)p(\theta|\eta)}{p(y)}p(\eta)\,d\eta\\ &=\int\frac{p(y|\theta)p(\theta|\eta)}{p(y)}p(\eta)\frac{p(\eta,y)}{p(\eta,y)}\,d\eta\\ &=\int\frac{p(y|\theta)p(\theta|\eta)}{\frac{p(\eta,y)}{p(\eta)}}\frac{p(\eta,y)}{p(y)}\,d\eta\\ &=\int\frac{p(y|\theta)p(\theta|\eta)}{p(y|\eta)}p(\eta|y)\,d\eta\\ &=p(y|\theta)\int\frac{p(\theta|\eta)}{p(y|\eta)}p(\eta|y)\,d\eta. \end{aligned} \label{eq:bayes_2_1} \end{equation} From Eqs. \eqref{eq:bayes_1} and \eqref{eq:bayes_2_1}, \begin{equation} p(\theta|y) =\int p(\theta|\eta,y)p(\eta|y)\,d\eta =p(y|\theta)\int\frac{p(\theta|\eta)}{p(y|\eta)}p(\eta|y)\,d\eta. \label{eq:bayes_2} \end{equation}

Note that $\eta$ and $y$ are conditionally independent when given $\theta$ because $y$ is generated from $\theta$, which in turn is sampled from $\eta$. When $\theta$ is given, $\eta$ will have no impact on $y$ and vice versa because the given $\theta$ is between $\eta$ and $y$. That is, \begin{equation} p(\eta|\theta,y)=p(\eta|\theta). \end{equation}

Now, $p(\eta|y)$ in Eq. \eqref{eq:bayes_2} can be expressed as \begin{equation} p(\eta|y) =\int p(\eta|\theta,y)p(\theta|y)\,d\theta =\int p(\eta|\theta)p(\theta|y)\,d\theta. \end{equation}