Lecture 37
return to Lectures
![]()
Coefficient of Determination (r-squared)
The coefficient of determination (r-squared) is a indicator of whether or not there is a good fit (or relationship) between the two variables
r
2 = SSR/SSTO = 1 - SSE/SSTOValue of r-squared always ranges from 0 to 1
What is the value of r-squared if there is a perfect fit between two variables?
What is the value of r-squared if there is no relationship between two variables?
Coefficient of Correlation (r)
Coefficient of correlation {r} is simple the square root of the coefficient of determination (r-squared)
Some prefer to present their results in terms of r rather that r-squared
This can be a little deceptive since r < r-squared
The value of r ranges from -1 to +1
The sign of r is the same as that of b
1
Inferences Concerning b1 (the slope of the line)
Hypothesis testing is conducted to determine where or not the slope of the regression line is zero
The hypothesis that is tested is
Ho: beta1 = 0
H1: bets1 not equal 0
If we conclude Ho, does this mean that we have a strong relationship between X and Y?
This hypothesis test is really a test for whether or not there is a statistically significant relationship between X and Y
The F-test
Two different procedures can be used to test the above hypothesis - the F-test and the t-test
Both procedures are related
F-test
The F-test is based MSR/MSE
If MSR/MSE is 1, then b1 = 0
The reason is that
If F*=MSR/MSE is large, reject the null hypothesis
What is large?
The sampling distribution of F* is based on the F-distribution
The relevant F-statistic is F(1-
a;1, n-2)(note F has two degrees of freedom - df for the numerator and df for the denominator)
If F*>F(1-
a;1, n-2) reject the null hypothesis, accept H1 (there is a significant relationship between X and Y)
The t-test
Test the same hypothesis as F-test
The t-test is based on the sampling distribution of b
1Mean
Variation
Distribution Type: Normal Distribution
The t-test
Estimated variance of b
1 is
The test statistic is
t* = (b
1 - 0)/s{b1}If t* > t(1-
a/2; n-2) reject null hypothesis
Lecture 38
Confidence Interval for Mean Response
One use of regression analysis is to predict the mean value of Y for a given value of X
For example, what is the average cost when the sales level is 800
The predict average is written as Y-hat
The specified level of X is Xh
Is Y-hat a random variable? Why?
Sampling Distribution for Mean Response
What are the parameters needed to characterize the sampling distribution?
Mean
Variation
Distribution Type
Normal Distribution
Sampling Distribution for Mean Response
In practice,
B0, B1 and s2 are not knowThe estimators for
B0 and B1 are b0 and b1The for
s2 is MSETherefore, the estimator for
s2 {Y-hat h) is s2 {Y-hat h)
Shortcut for above equation
Confidence Interval for Mean Response
Based on the sampling distribution the (1-
a) confidence interval of the mean response is

Where t = t(1-
a/2: n-2)
Factors Affecting Variance of the Mean Response
The variance of the mean response (and hence the width of the confidence interval) varies with the following parameters
I. The MSE - the larger MSE, the larger the variance of the mean response
II. Deviation of Xh from X-bar - the further Xh is from X-bar the larger the variance of the mean response.
From a practical point of view this means that, in planning a study, we should try to make sure that the mean value of the observed Xis is close to the value of X which is of most significance in the study
III. Variability of Xis - the greater the variability of Xis, the smaller the variance of the mean response. In planning our study, we want as wide a range as possible for the Xis
IV. Size of Sample - the larger the sample size the smaller is the variability of the mean response
![]()
Lecture 39
Prediction Interval for New Response
What is the difference between the confidence interval and the prediction interval?
What is the difference between the mean response and the new response?
Mean response - this is the average value expected for a given level of X
New response - this is the prediction for an individual response for a given level of X
Which would have the larger variance - the mean response or the new response?
Sampling Distribution for New Response
Mean
Variation
Distribution Type
Normal Distribution
Prediction Interval for New Response
The estimator for
s2 {Y-hat h new) is s2 {Y-hat h new)
Prediction Interval for New Response
Where t = t(1-
a/2: n-2)
return to Lectures
Date of last update - 09 Dec 1998