E. Interpolation, Extrapolation

1. Purpose

One of the reasons we fit a line or equation to data is to allow determination of one variable based on fixing the other variable's value. For example, Figure I-9 is a graph of a steel tape's length at differnt pull amounts. 

 
 Figure I-9
Tape Pull Clibration

 

As pull was changed, the tape length was measured against a calibrartion line; length is dependent on pull.

The equation of a best-fit line determined by Linear Regression is L = 0.0059P+99.869.

2. Interoplation

How much pull should be applied to achive 100.000 ft? We can determine the pull by either scaling it from the graph (22.1, shown in red) or by solving the equation:

100.000 = 0.0059P+99.869  arrange to solve for P:  P = (100.000-99.869)/0.0059 = 22.2 lbs.

This is called interpolation: using the data to predict one variable value from the other.

Interpolation is done within the data range. When collecting measurements, we try to bracket the range that we may later want to determine.

For example, we calibrate a tape to determine the conditions necessary for its length to be 100.000 ft. We stared with a pull that yielded a tape length less than 100.000 ft then progressively increased pull until the length exceeded 100.000 ft. Then we progressicvely decreased pull until the length was less than 100.000 ft. We have enough data on both sides of 100.000 ft to reliably determine the pull needed to achieve it.

3. Extrapolation

Using the same data in Figure I-9, what will be the tape length when 25 pounds of pull are applied?

We can't determine it directly from the graph because it doesn't go out far enough. We could extend it the graph, but that's a bit cumbersome. Or we can use the equation: 

L = 0.0059P+99.869 = 0.0059(25)+99.869 = 100.016 ft

This is extrapolation: predicting one variable value from the other outside the data range.

Extrapolations should be limited to values very close to the data range. The line fit is based on the behavior within the data range. We don't know what happens outside that range. Just because we can determine an equation doesn't mean it's a good predictor outside the range.

For example, how much pull would be needed to strech tape to 105.000 ft? From the equation, it would be:

P = (105.000-99.869)/0.0059 = 870 lbs

Not only is 870 lbs an unreasonable amount to attempt, the tape itself would fail before that. At some point it reaches its plasticity limit and won't return to its original length; keep going and it eventually will fail by breaking. We can't determine either of those based this data set because it is below the failure threshhold.

We generally leave extrapolation to economists, politicians, and weather forecasters.