Briefly |
Standard error is the slop in the mean you're estimating; standard deviation is the spread in the distribution. |
The distinction between these two is a good example of what many people find confusing about statistics.
Suppose you choose sixteen people at random and determine their ages. Here they are:
5, 10, 15, 20, 25, 30, 35, 40,
45, 50, 55, 60, 65, 70, 75, 80.
If we wanted to describe these sixteen people as a group, we could calculate the mean (42.5) and standard deviation (23.05) of their ages. (We used popStdDev(age) to compute it--the one with N in the denominator, not N-1. More) These numbers are legitimate descriptors of the group of sixteen.
But if we want to draw conclusions about people in general (since these people were randomly selected) we do things a little more elaborately.
First, our best estimate for the mean of the population is the mean of the sample: 42.5.
Next, our best estimate for the standard deviation of the population is the "larger" standard deviation (In Fathom, the function s(age), the one with N-1 in the denominator), or 23.80.
Now, we know that the mean is probably not exactly 42.5. How far are we likely to be off? Surely not 23.8. That's a measure of the spread of the distribution, not the slop in the estimate. Put another way, if we were to sample one more person, we wouldn't be surprised if she were 20 years old, or 65. But if we sampled 16 more people, a mean of 20 would be extremely unlikely given our first sample.
It turns out that the estimate for the slop in the mean is the standard deviation (with N-1) divided by the square root of the number of cases. In this case, that gives us 23.8/4, or 5.95. This is called the standard error. In general,
where s is the estimate of the population's standard deviation based on the sample.
Connection to the Central Limit Theorem
If we were to sample from a population repeatedly and compute the means of the samples, those means would be normally distributed. Their distribution would have a standard deviation. The standard error is the best estimate of that standard deviation. So the 95% confidence interval of the mean, for example, is 1.96 times the standard error.