Chapter 13 — The Flaw of Averages

Chapter 13

Mindle 5: Interrelated Uncertainties

Correlate This!

CORRELATION is a red word, whose use I discourage, first because almost no one knows what it really means, and second, because people think that when uncertain numbers are not correlated, that they are NOT interrelated. Wrong!

Download Correlate This!.xls

What Correlation Means

So you really want to know what CORRELATION means? I’ll do my best.

Chapter 13-1.gif
 

I will start with the related concept of the COVARIANCE of two uncertain numbers, X, and Y, which closely parallels the concept of the VARIANCE of a single uncertain number as defined in Chapter 8. Assume that past data for the X and Y values have been graphed in a scatter plot as shown below.

Chapter 13-2.gif
 

Add to the scatter plot, the point representing the average of all the X and Y variables, and draw a rectangle from the “average” point to each of the other points in the plane.

 

Next consider the areas of these rectangles. In particular, the rectangle defined by X1 ,Y1 has area = (X1- Avg.X)*(Y1- Avg.Y). And how about the rectangle defined by X2 ,Y2? Since (Y2- Avg.Y) is negative, this area is negative, and I have shaded all negative rectangles in the drawing. Not exactly singles bar chit chat, huh? But at last we are ready to deliver the meaning of a red word.

The COVARIANCE of X and Y (sXY) is defined as the average area of all the rectangles (remembering that some are positive and some negative). Notice that if X was positive every time that Y was, there would be a strong positive relationship and sXY > 0. Similarly if X and Y went in opposite directions, like the Petroleum and Airline stock example, then there would be more negative areas than positive, and sXY < 0. If X and Y were not interrelated, then the positives and negatives would cancel out, and sXY = 0. NOTE: the opposite is NOT true. That is sXY = 0 does NOT imply that there isn’t an interrelationship between the variables. See the “Correlate This!” example above. This is one of my biggest gripes with this Steam Era concept.

If you glance back at the derivation of VARIANCE from the Chapter 8 web page, you will see that the average area of squares used there closely parallels the average area of rectangles used here. Similarly the problem with squared units carries over as well. For example, there is clearly a positive COVARIANCE between a person’s height in feet and their weight in pounds. And what are the units of this COVARIANCE? Foot Pounds, or (if you remember your physics) Torque! This implies, well, … um, I have no idea. But at least we are now ready for the main event.

Definition of CORRELATION

The CORRELATION of X and Y (sometimes written as ρXY) is defined as σXY /σX σY. It has is a unitless quantity (bye bye torque) and has the nice property that -1≤ ρXY ≤+1. A value of -1 implies that X and Y lie on a straight line with negative slope, while a value of +1 implies that X and Y lie on a straight line with positive slope. A value of 0 implies that X and Y do not lie on a straight line, but that doesn’t mean there couldn’t be something else going on between X and Y as demonstrated in Correlate_This!.xls.