\( X \) |
|
Random or stochsatic variable that can take on different values,
often written as a set: \( X = \{ x_1, x_2, \cdots, x_N \} \).
Sometimes the numbers have different probabilities:
\( p_1, p_2, \cdots, p_N \).
|
Mean |
|
Arithmetic mean or average is a common way of measuring the centre
of many numbers. It is denoted \( \mu \) or \( E[X] \) and defined
as: \( \mu = E[X] = \frac{1}{N} \sum_{t=1}^N x_t \)
|
Var |
|
Variance is a common way of measuring the spread of many numbers.
It is denoted \( \sigma^2 \) or \( Var(X) \) and defined as:
\( \sigma^2 = Var(X) = E[X^2] - E[X]^2 =
\frac{1}{N} \sum_{t=1}^N ( x_t - \mu )^2 \)
|
Std |
|
Standard deviation is also a common way of measuring the spread of
many numbers. It is simply the square root of the variance, which has
the advantage of having the same units (such as $ or millions) as the
original numbers, whereas the variance squares those units.
It is denoted \( \sigma \) or \( Std(X) \) and defined as:
\( Std(X) = \sqrt{Var(X)} = \sqrt{\sigma^2} = \sigma \)
|
Median |
|
When the numbers are sorted from smallest to largest, then the median
is the middle number in the list. For example, if the sorted numbers
are \( [1, 3, 7, 9, 12, 15, 27, 78, 1018] \) then the median is 12.
The median is generally more robust to extreme outliers than the
arithmetic mean, which is 130 in this example because of the extreme
outlier 1018. Without that outlier the arithmetic mean would only be
19. If the list of numbers has an an even number of elements, then
the median is the average of the two middle elements.
|
Quartile |
|
When the numbers are sorted from smallest to largest, then the three
quartiles are the numbers that separate the list into four equally
large parts, and the 2nd quartile is the median.
|
IQR |
|
Inter-Quartile Range is the difference between the 3rd and 1st
quartiles. It is used to measure the spread of the numbers when
removing outliers, because it is more robust to skewed distributions
than using e.g. the mean and standard deviation to determine
outliers.
|
KDE |
|
Kernel Density Estimate is a method for smoothing observed
data-points to estimate a continuous probability distribution.
|