Briefly |
A measure of spread (such as the standard deviation) is a statistic that tells you about the variation in an attribute. |
Here are two common measures of spread:
standard deviation |
Use this seemingly odd procedure: take all the
deviations of the values from the overall mean. Square them. Then add
them up and divide by the number of values. (So you have the mean of the
squares.) Then take the square root of the result. This is also called
the root-mean-square deviation. In Fathom, use the function popStdDev( ) or s( ). What's the difference? |
interquartile range |
In Fathom, use the function iqr( ). About calculating IQR. |
Which one you use depends (as with measures of center) on what you need. Here are some considerations:
If you think your data may be normally distributed, standard deviation is perfect, partly because the standard deviation is one of the parameters in the normal distribution function. This is true of sampling distributions of many statistics, and many measurements.
If the distribution is skewed, or if there's any reason the mean may not be an appropriate measure of center, you might prefer the interquartile range. This is because the standard deviation is about deviations from the mean, while interquartile range looks at the distribution without reference to its center.
Interquartile range (like median) is a resistant measure, that is, it doesn't change if extreme values change. Standard deviation, in contrast, (like mean) uses every data point, and--because of the square--is especially sensitive to changes in extreme values.
There are other measures of spread. For example, you might construct a mean absolute deviation (in Fathom, sum(abs(x mean(x))) ) that would not be as sensitive to outliers. Or you could use the range ( max(x) min(x) ), or percentiles other than 25 and 75--or any other quantity that somehow expressed spread. Each measure will have somewhat different properties.