Lecture 37

return to Lectures

 

 

 

Coefficient of Determination (r-squared)

The coefficient of determination (r-squared) is a indicator of whether or not there is a good fit (or relationship) between the two variables

r2 = SSR/SSTO = 1 - SSE/SSTO

Value of r-squared always ranges from 0 to 1

What is the value of r-squared if there is a perfect fit between two variables?

What is the value of r-squared if there is no relationship between two variables?

 

Coefficient of Correlation (r)

Coefficient of correlation {r} is simple the square root of the coefficient of determination (r-squared)

Some prefer to present their results in terms of r rather that r-squared

This can be a little deceptive since r < r-squared

The value of r ranges from -1 to +1

The sign of r is the same as that of b1

 

 

 

 

Inferences Concerning b1 (the slope of the line)

Hypothesis testing is conducted to determine where or not the slope of the regression line is zero

The hypothesis that is tested is

 

Ho: beta1 = 0

H1: bets1 not equal 0

 

If we conclude Ho, does this mean that we have a strong relationship between X and Y?

 

This hypothesis test is really a test for whether or not there is a statistically significant relationship between X and Y

 

The F-test

Two different procedures can be used to test the above hypothesis - the F-test and the t-test

Both procedures are related

F-test

The F-test is based MSR/MSE

If MSR/MSE is 1, then b1 = 0

The reason is that

 

 

If F*=MSR/MSE is large, reject the null hypothesis

What is large?

The sampling distribution of F* is based on the F-distribution

The relevant F-statistic is F(1-a;1, n-2)

(note F has two degrees of freedom - df for the numerator and df for the denominator)

If F*>F(1-a;1, n-2) reject the null hypothesis, accept H1 (there is a significant relationship between X and Y)

 

 

The t-test

 

 

Test the same hypothesis as F-test

The t-test is based on the sampling distribution of b1

Mean

Variation

Distribution Type: Normal Distribution

 

 

The t-test

Estimated variance of b1 is

 

 

 

The test statistic is

t* = (b1 - 0)/s{b1}

If t* > t(1-a/2; n-2) reject null hypothesis

 

Lecture 38

 

Confidence Interval for Mean Response

One use of regression analysis is to predict the mean value of Y for a given value of X

For example, what is the average cost when the sales level is 800

The predict average is written as Y-hat

The specified level of X is Xh

 

Is Y-hat a random variable? Why?

 

 

Sampling Distribution for Mean Response

What are the parameters needed to characterize the sampling distribution?

Mean

wpe9.jpg (3172 bytes)

Variation

 

wpe8.jpg (6709 bytes)

 

 

Distribution Type

Normal Distribution

 

Sampling Distribution for Mean Response

In practice, B0, B1 and s2 are not know

The estimators for B0 and B1 are b0 and b1

The for s2 is MSE

Therefore, the estimator for s2 {Y-hat h) is s2 {Y-hat h)

 

wpe7.jpg (6914 bytes)

 

Shortcut for above equation

wpe6.jpg (3356 bytes)

 

Confidence Interval for Mean Response

Based on the sampling distribution the (1-a) confidence interval of the mean response is

 

wpe5.jpg (2411 bytes)

 

 

Where t = t(1-a/2: n-2)

 

 

 

Factors Affecting Variance of the Mean Response

The variance of the mean response (and hence the width of the confidence interval) varies with the following parameters

I. The MSE - the larger MSE, the larger the variance of the mean response

 

II. Deviation of Xh from X-bar - the further Xh is from X-bar the larger the variance of the mean response.

From a practical point of view this means that, in planning a study, we should try to make sure that the mean value of the observed Xis is close to the value of X which is of most significance in the study

 

III. Variability of Xi’s - the greater the variability of Xis, the smaller the variance of the mean response. In planning our study, we want as wide a range as possible for the Xi’s

 

IV. Size of Sample - the larger the sample size the smaller is the variability of the mean response

 

Lecture 39

 

Prediction Interval for New Response

What is the difference between the confidence interval and the prediction interval?

What is the difference between the mean response and the new response?

Mean response - this is the average value expected for a given level of X

New response - this is the prediction for an individual response for a given level of X

Which would have the larger variance - the mean response or the new response?

 

Sampling Distribution for New Response

Mean

wpeD.jpg (3271 bytes)

Variation

wpeC.jpg (7325 bytes)

 

Distribution Type

Normal Distribution

 

Prediction Interval for New Response

The estimator for s2 {Y-hat h new) is s2 {Y-hat h new)

 

 

wpeA.jpg (7456 bytes)

 

 

Prediction Interval for New Response

 

wpeB.jpg (2911 bytes)

 

Where t = t(1-a/2: n-2)

 

 

 

return to Lectures

Date of last update - 09 Dec 1998