Lecture 34
return to Lectures
![]()
Simple Linear Regression
Regression analysis is used to investigate the relationship between two or more variables
Specifically it is used to answer the following question
Is there a relationship?
What is the nature of the relationship?
How good is the relationship?
Once the relationship is developed it can be used for PREDICTION.
Probabalistic Model
Relationships that we are use to in engineering are deterministic in nature - we assume there is a perfect one to one relationship between the variables (for example: acceleration versus velocity)
Regression relationships are PROBABILISTIC - the relationship is not perfect - there is some error involved in mapping from one variable to another
The model takes the general form
y = deterministic component + random error
Estimation of the Regression Parameters
Regression Model
The regression parameters (
b0, b1, and ei) are estimated from SAMPLE DATAThe estimators for
b0, b1, and ei are written as b0-hat, b1 -hat , and ei -hat , respectivelyThe fitted line is Y-hat, where
![]()
Method of Least Squares
The estimators,
b0 and b1 are obtained using the method of least squaresThe least squares line is the line for which the sum of the square of the deviation from the line is a minimum
In other words,
The solution to this expression gives the equation for
b0 and b1.
Lecture 35
Mean Response
For a given value of X the mean response for Y is
expected value of y = betao + beta1 * X
The mean response can be estimated from the estimated regression function
Point Estimator for mean response is
y-hat = beta-hato + beta-hat1 * X
Residual or Error
error = yi - yhati
Sum of Squares
Variation about the mean is measure by the
TOTAL SUM OF SQUARES (SSTO)
SSTO = sum of square of (y - ybar)
Variation about the regression line (the error term) is measured by the
ERROR SUM OF SQUARES (SSE)
SSE = sum of square of (y - yhat)
Sum of Squares
The SSTO is made up of two parts
REGRESSION SUM OF SQUARES
ERROR SUM OF SQUARES
SSTO = SSR + SSE
Translation to Text
SSTO = SSyy
SSR = SSxy
Partitioning of the Sum of Squares
The SSyy is made up of two parts
REGRESSION SUM OF SQUARES
ERROR SUM OF SQUARES
SSTO = SSR + SSE
The REGRESSION SUM OF SQUARES is the reduction in variability caused by using the regression function
SSR = sum of square of (y-hat - y-bar)
Degrees of Freedom (df)
Each sum of square has associated with it a given degree of freedom
SSTO - df = n - 1
SSR - df = 1
SSE - df = n - 2
Note:
SSTO = SSR + SSE
similarly
df of SSTO = df of SSR + df of SSE
n - 1 = 1 + (n - 2)
Mean Sum of Squares
The mean sum of squares are found by dividing the sum of squares by their corresponding degrees of freedom
Two mean sum of squares
REGRESSION MEAN SUM OF SQUARES (MSR)
ERROR MEAN SUM OF SQUARES (MSE)
MSR = SSR/df for SSR
MSE or s2 = SSE/df for SSE
MSE is the point estimator the variance of the error term
![]()
Lecture 36
Residual Analysis
Residual Analysis is used to evaluate whether or not the model that is fitted to the data is appropriate
Four issues are considered in evaluating whether or not the model is appropriate
Linearity - is the relationship between the two variables linear
Constant Variance - does the error term (ei) have constant variance at all levels of X
Normality - are the error terms normally distributed
Independence - are the error terms independent
Independence can be difficult to assess since it refers to the relationship between the error term and a third parameter (not X or Y). One common example is lack of independence with time
Residual Analysis
Two plots are used for residual analysis
Residual Plot - error vs fitted value
Normal Plot - actual error vs expected error for normality
Residual plot is used to evaluate linearity, constant variance, and independence (and to some extent normality)
Normal plot is used to evaluate normality :)
Normal Plot
Plot of actual vs expected residue under normality
Expected Residue under normality is given by
z(i-0.5/n) times (MSE)0.5
(standard normal value for the ith ranked item) times
(standard deviation for the error term)
if we have ten items, what is (i-0.5/n) for the second smallest?
what is z (i-0.5/n) for the second smallest?
What does z (i-0.5/n) give us?
Worksheet for calculating expected residue!
Class Work - Extra Credit (due Friday 12/4)
Use the vending machine data for the following
b0 = 55.74 b1 = 0.0454
I. Plot the residual plot
II. Plot the normal plot
III. Is there evidence of non-linearity, non-constant variance, non-normality
IV Given that machine 1, 2, 3, and 6 are maintained by one contractor and machice 4 and 5 by a second. State whether or not there is evidence of non-independence for the error terms.
return to Lectures
Date of last update - 30 Nov 1998