笔记源于书《Statistics for the Behavioral Sciences 10e》

 

Biased and Unbiased Statistics
Earlier we noted that sample variability tends to underestimate the variability in the corresponding population. To correct for this problem we adjusted the formula for sample variance by dividing by n -1 instead of dividing by n. The result of the adjustment is that sample variance provides a much more accurate representation of the population variance. Specifcally, dividing by n -1 produces a sample variance that provides an unbiased estimate of the corresponding population variance. This does not mean that each individual sample variance will be exactly equal to its population variance. In fact, some sample variances will overestimate the population value and some will underestimate it. However, the average of all the sample variances will produce an accurate estimate of the population variance. This is the idea behind the concept of an unbiased statistic. 

 

 

A sample statistic is unbiased if the average value of the statistic is equal to the population parameter. (The average value of the statistic is obtained from all the possible samples for a specifc sample size, n.)


A sample statistic is biased if the average value of the statistic either underestimates or overestimates the corresponding population parameter. 

 

The following example demonstrates the concept of biased and unbiased statistics. 

 

E x a m p L E 4 . 9
We begin with a population that consists of exactly N=6 scores: 0, 0, 3, 3, 9, 9. With a
few calculations you should be able to verify that this population has a mean of m=4 and
a variance = 14.

统计学基础Statistics for the Behavioral Sciences 之 Sample Variance as an Unbiased Statistic_ide

 

 

 

Next, we select samples of n = 2 scores from this population. In fact, we obtain every
single possible sample with n = 2. The complete set of samples is listed in Table 4.1. Notice 

 

that the samples are listed systematically to ensure that every possible sample is included.
We begin by listing all the samples that have X = 0 as the frst score, then all the samples
with X = 3 as the frst score, and so on. Notice that the table shows a total of 9 samples.
Finally, we have computed the mean and the variance for each sample. Note that the
sample variance has been computed two different ways. First, we make no correction for
bias and compute each sample variance as the average of the squared deviations by simply
dividing SS by n. Second, we compute the correct sample variances for which SS is divided
by n - 1 to produce an unbiased measure of variance. You should verify our calculations
by computing one or two of the values for yourself. The complete set of sample means and
sample variances is presented in Table 4.1 

 

First, consider the column of biased sample variances, which were calculated dividing
by n. These 9 sample variances add up to a total of 63, which produces an average value
of 63/9 = 7. The original population variance, however, is 14. Note that the average
of the sample variances is
not equal to the population variance. If the sample variance is
computed by dividing by n, the resulting values will not produce an accurate estimate of
the population variance. On average, these sample variances underestimate the population
variance and, therefore, are biased statistics

Next, consider the column of sample variances that are computed using n - 1. Although
the population has a variance of  14, you should notice that none of the samples has
a variance exactly equal to 14. However, if you consider the complete set of sample variances, you will fnd that the 9 values add up to a total of 126, which produces an average
value of 126/9= 14.00. Thus, the average of the sample variances is exactly equal to the original population variance. On average, the sample variance (computed using n - 1) produces
an accurate, unbiased estimate of the population variance

 

Finally, direct your attention to the column of sample means. For this example, the
original population has a mean of m = 4. Although none of the samples has a mean exactly
equal to 4, if you consider the complete set of sample means, you will fnd that the 9 sample
means add up to a total of 36, so the average of the sample means is 63/9=4. Note that the
average of the sample means is exactly equal to the population mean. Again, this is what
is meant by the concept of an unbiased statistic. On average, the sample values provide
an accurate representation of the population. In this example, the average of the 9 sample
means is exactly equal to the population mean.

 

In summary, both the sample mean and the sample variance (using n - 1) are examples
of unbiased statistics
. This fact makes the sample mean and sample variance extremely valuable for use as inferential statistics. Although no individual sample is likely to have a mean
and variance exactly equal to the population values, both the sample mean and the sample
variance, on average, do provide accurate estimates of the corresponding population values.