A lower bound of zero in most cases and rare to find negative values (any examples?)
Presence of outliers, often high ones
Positive skewness
Non-normal distribution
Data reported with some thresholds (censored data)
Seasonal patterns
Autocorrelation
Dependence on other variables

Edit

4 Measures of central tendency

Mean (typically arithmetic mean)
Median
Mode

Edit

4.1 Arithmetic mean: A classical measure of central tendency

$\def\mean#1{\bar{#1}}$ \begin{equation} \mean{X}= \sum_{i=1}^n\frac{X_i}{n}= \sum_{i=1}^k\mean{X}_i\frac{n_i}{n}= \mean{X}_{(j)}\frac{n-1}{n}+X_j\frac{1}{n}= \mean{X}_{(j)}+\left(X_j-\mean{X}_{(j)}\right)\frac{1}{n} \end{equation}

Sensitive to outliers.

Edit

4.2 Median: A resistant measure of central tendency

$\def\median{\text{Median}}$ \begin{equation} \median= \begin{cases} X\left(\frac{n+1}{2}\right)&\text{if $n$ is odd}\\ \frac{1}{2}\left[X\left(\frac{n}{2}\right)+X\left(\frac{n}{2}+1\right)\right]&\text{if $n$ is even} \end{cases} \end{equation}

Less sensitive to outliers.

Edit

4.3 Mode

Occurring most often from a discrete dataset

Edit

4.4 Geometric mean

$\def\gmean{\text{GM}}$ \begin{equation} \gmean=\left(\prod_{i=1}^nX_i\right)^{1/n} \end{equation} where $X_i>0$.

Useful for positively skewed datasets.

Edit

5 Measures of variability

Edit

5.1 Sample variance: A classical measure of variability

\begin{equation} s^2=\sum_{i=1}^n\frac{\left(X_i-\mean{X}\right)^2}{n-1} \end{equation}

Edit

5.2 Interquartile range (IQR): A resistant measure of variability

Percentiles $P_{X,j}$ can be calculated from a sorted dataset from smallest to largest, $X_i$ for $i=1,\cdots,n$: \begin{equation} P_j=X_{(n+1)\cdot j} \end{equation} and the interquartile range (IQR) can be calculated as follows: $\def\iqr{\text{IQR}}$ \begin{equation} \iqr=P_{0.75}-P_{0.25} \end{equation}

What if $(n+1)\cdot j$ is not an integer? Interpolation and we typically use the Weibull plotting position in hydrology (type=6 in quantile() in R).

Edit

5.3 Median absolute deviation (MAD): A resistant measure of variability

$\def\mad{\text{MAD}}$ \begin{equation} \mad(X)=\median{\left(\left|X_i-\median(X)\right|\right)} \end{equation}

Edit

5.4 Coefficient of variation (CV): A nondimensional measure of variability

$\def\cv{\text{CV}}$ \begin{equation} \cv=\frac{s^2}{\mean{X}} \end{equation}

Useful for characterizing the degree of variability in datasets.

Edit

6 Example 1.1

Dataset (a): 2, 4, 8, 9, 11, 11, 12
Dataset (b): 2, 4, 8, 9, 11, 11, 120

Edit

7 Homework: Measures of hydrologic data

Define your own function for Eq. (1.10).
Solve Exercise 2.

Submit your R file with comments.

Edit

8 Exercise: Streamflow data cleaning and measures

q <- read.table("usgs_streamflow.txt")
q[1:10,]
q <- read.table("usgs_streamflow.txt", header=T)
q[1:10,]
colnames(q)
q <- read.table("usgs_streamflow.txt", header=T, skip=1)
q[1:10,]
q <- read.table("usgs_streamflow.txt", header=T)
q[1:10,]
q <- q[-1,]
q[1:10,]
rownames(q) <- 1:nrow(q)
q[1:10,]
colnames(q)[1:3]
colnames(q) <- c(colnames(q)[1:3], "q", "qcode")
plot(q[,"q"], pch=20, type="l")
plot(q[,"q"], pch=20, type="l", log="y")
hist(q[,"q"])
q[grep("[A-Za-z]", q[,"q"]),"q"] <- NA
q[grep("[A-Za-z]", q[,"q"]),"q"]
hist(q2[,"q"])
mean(q[,"q"], na.rm=T)
var(q[,"q"], na.rm=T)
sd(q[,"q"], na.rm=T)
is.na(q[,"q"])
(1:nrow(q))[is.na(q[,"q"])]

Edit