An FAQ for Statistics
(Return to
Data Analysis
)
-
Measures of central tendency: What is the most typical or representative score?
-
Median:
The score that divides the distribution in half;
Fifty percent of the respondents are at the median or above, and 50% are at the median and below. The advantage of the median is that it is insensitive to outliers or non-normal distributions.
-
Mean:
The arithmetic average of a set of scores.
If the distribution is reasonably symmetrical, the mean will be quite close to the median; for skewed distributions (scores that "lean" to the left or right) the mean may be different than the median. Check the shape of distributions before presenting the mean as the best representation of the "typical score".
-
If the distribution is bi-modal (more responses on '1' and '7' than any intermediate scores), then the mean will be seriously misleading.
-
If the distribution has a high variance (fairly equal number of scores on every point of the scale), then the mean will suggest a centrality that does not exist.
-
Mode:
The most frequent score
Note that for strongly skewed distributions the mean might be 4 or 5 while the mode is 7.
-
Measures of dispersion: Is the distribution concentrated around the average? Or spread out?
-
Standard Deviation:
-
Measures of normality: Is the distribution like the classical "bell shaped" curve?
-
Kurtosis:
Measures how flat or peaked a distribution is.
Positive scores mean the distribution is even more peaked than a normal, bell-shaped distribution. Negative scores mean the distribution is flatter than a normal distribution.
-
Skew:
Measures whether a distribution is symmetrical or whether it "leans" to the left or right.
Positive scores means the distribution leans toward the left, while negative scores means it leans toward the right.
-
Measures of association
-
Correlation:
Correlation scores range from -1.0 to +1.0. A positive score means the two variables in question vary together; high scores on one variable are associated with higher scores on the other, and vice versa. A negative scores means the two variables vary in opposite directions; a high score on one is associated with lower scores on the other. The strength of that association is reflected in the score. Correlations of .70 (or -.70) or more suggest a fairly strong association. Correlations less than .30 (or -.30) are fairly trivial. A good rule of thumb is to square the correlation; the result is the percent of covariance. So a correlation of .70 would mean 49% covariance. A correlation of .30 would mean 9% covariance. Covariance is just the extent to which the two variables "vary together". You needn't understand the concept precisely to appreciate the importance of 49% covariance compared with 9% covariance.
-
Regression:
Regression addresses a related but different notion of association.
Consider the typical X-Y plot
. If we have a plot of data points on 2 variables, the correlation measures how tightly the data cluster around the presumed linear relationship. The regression coefficient measures the slope of the line, independent of how tightly the data cluster around that line. In the social realm, we seldom see high correlations unless the regression coefficients are also high; it can happen theoretically, and it is more common in the physical or biological sciences. But with questions about attitudes, feelings, and relationships, more often than not a high correlation portends a strong regression line as well.
-
Statistical significance: Are the differences in the data real? Or just random variation?
-
Statistical significance is usually reported as a percentage, such as .01 or .05. Such numbers are read as "There is only a 1% (or 5%) chance that differences of this magnitude would be found due to random error, that is, there is no real difference in the real world population."
-
The accepted standard is that statistical significance > 5% is too chancey to worry about. Keep in mind that if you do 40 statistical tests that are significant at the .05 level, there's a very good chance at least 1 or 2 of them are showing a difference when there really isn't one!
-
There are a great number of tests for statistical significance (T-tests, Chi-square, etc.). There are tests for normal distributions and a special set of tests for non-normal distributions. The most important thing to remember, however, is not how to calculate it, but rather what it means. Statistical significance DOES NOT mean "important", it DOES NOT mean "strategically significant". It DOES NOT mean "competitive advantage". It means only that the observed difference is not likely to be due to random variance in the choice of sample. So a p=.00001 is not "really important", just "really real".
Back to Top