Sample variance

Assume $X_1,\dots,X_n$ are independent and identically distributed (i.i.d.) random variables with population mean $\mu$ and population variance $\sigma^2$. The sample variance of $\{X_i\}_{i=1}^n$ is \begin{equation} s^2=\frac{1}{n-1}\sum_{i=1}^n(X_i-\bar{X})^2 \label{eq:sample-variance} \end{equation} where $n$ is the sample size, $X_i$ is the $i$-th random variable, and $\bar{X}$ is the sample mean.

Proof

\[ \DeclareMathOperator{\E}{\mathbb{E}} \DeclareMathOperator{\Var}{Var} \DeclareMathOperator{\Cov}{Cov} \]

Define the sum of squared deviations from the sample mean as \begin{equation} \begin{aligned} S &=\sum_{i=1}^n(X_i-\bar{X})^2\\ &=\sum_{i=1}^n[(X_i-\mu)-(\bar{X}-\mu)]^2\\ &=\sum_{i=1}^n(X_i-\mu)^2-2(\bar{X}-\mu)\sum_{i=1}^n(X_i-\mu)+n(\bar{X}-\mu)^2\\ &=\sum_{i=1}^n(X_i-\mu)^2-2n(\bar{X}-\mu)^2+n(\bar{X}-\mu)^2\\ &=\sum_{i=1}^n(X_i-\mu)^2-n(\bar{X}-\mu)^2. \end{aligned} \label{eq:S} \end{equation}

Take the expectation of both sides of Eq. \eqref{eq:S}: \begin{equation} \begin{aligned} \E[S] &=\E\left[\sum_{i=1}^n(X_i-\mu)^2-n(\bar{X}-\mu)^2\right]\\ &=\E\left[\sum_{i=1}^n(X_i-\mu)^2\right]-n\E[(\bar{X}-\mu)^2] \end{aligned} \label{eq:E-S} \end{equation}

We can rewrite the first term in Eq. \eqref{eq:E-S} as \begin{equation} \E\left[\sum_{i=1}^n(X_i-\mu)^2\right]=\sum_{i=1}^n\E[(X_i-\mu)^2]=n\sigma^2 \label{eq:E-S-term1} \end{equation} and the second term as \begin{equation} \begin{split} n\E[(\bar{X}-\mu)^2] &=n\Var(\bar{X})\\ &=n\Var\left(\frac{1}{n}\sum_{i=1}^nX_i\right)\\ &=n\frac{1}{n^2}\Var\left(\sum_{i=1}^nX_i\right)\\ &=\frac{1}{n}\left[\sum_{i=1}^n\Var(X_i)+2\sum_{i<j}\Cov(X_i,X_j)\right]\\ &=\frac{1}{n}\sum_{i=1}^n\Var(X_i)\\ &=\frac{1}{n}n\sigma^2\\ &=\sigma^2. \end{split} \label{eq:E-S-term2} \end{equation}

Plugging Eqs. \eqref{eq:E-S-term1} and \eqref{eq:E-S-term2} into Eq. \eqref{eq:E-S} yields \begin{equation} \E[S]=n\sigma^2-\sigma^2=(n-1)\sigma^2. \label{eq:E-S-2} \end{equation}

By rearranging Eq. \eqref{eq:E-S-2} and plugging Eq. \eqref{eq:S} into it, we obtain \begin{equation*} \sigma^2 =\frac{1}{n-1}\E[S] =\E\left[\frac{1}{n-1}\sum_{i=1}^n(X_i-\bar{X})^2\right] =\E[s^2] \end{equation*} where the sample variance is \begin{equation*} s^2=\frac{1}{n-1}\sum_{i=1}^n(X_i-\bar{X})^2. \end{equation*}

The sample variance $s^2$ is an unbiased estimator of the population variance $\sigma^2$.