Lecture 34

return to Lectures

 

 

Simple Linear Regression

Regression analysis is used to investigate the relationship between two or more variables

Specifically it is used to answer the following question

  • Is there a relationship?

  • What is the nature of the relationship?

  • How good is the relationship?

Once the relationship is developed it can be used for PREDICTION.

 

 

Probabalistic Model

Relationships that we are use to in engineering are deterministic in nature - we assume there is a perfect one to one relationship between the variables (for example: acceleration versus velocity)

Regression relationships are PROBABILISTIC - the relationship is not perfect - there is some error involved in mapping from one variable to another

The model takes the general form

y = deterministic component + random error

 

Estimation of the Regression Parameters

Regression Model

wpe4.jpg (2943 bytes)

 

The regression parameters (b0, b1, and ei) are estimated from SAMPLE DATA

The estimators for b0, b1, and ei are written as b0-hat, b1 -hat , and ei -hat , respectively

The fitted line is Y-hat, where

wpe3.jpg (2710 bytes)

 

Method of Least Squares

The estimators, b0 and b1 are obtained using the method of least squares

The least squares line is the line for which the sum of the square of the deviation from the line is a minimum

In other words,

wpe7.jpg (4063 bytes)

 

The solution to this expression gives the equation for b0 and b1.

 

 

wpe1.jpg (43389 bytes)

 

 

Lecture 35

 

 

Mean Response

For a given value of X the mean response for Y is

expected value of y = betao + beta1 * X

 

 

The mean response can be estimated from the estimated regression function

Point Estimator for mean response is

y-hat = beta-hato + beta-hat1 * X

 

 

Residual or Error

error = yi - yhati

 

 

 

Sum of Squares

Variation about the mean is measure by the

TOTAL SUM OF SQUARES (SSTO)

SSTO = sum of square of (y - ybar)

 

 

 

Variation about the regression line (the error term) is measured by the

ERROR SUM OF SQUARES (SSE)

SSE = sum of square of (y - yhat)

 

 

 

Sum of Squares

The SSTO is made up of two parts

REGRESSION SUM OF SQUARES

ERROR SUM OF SQUARES

SSTO = SSR + SSE

 

Translation to Text

SSTO = SSyy

SSR = SSxy

 

Partitioning of the Sum of Squares

The SSyy is made up of two parts

REGRESSION SUM OF SQUARES

ERROR SUM OF SQUARES

SSTO = SSR + SSE

 

The REGRESSION SUM OF SQUARES is the reduction in variability caused by using the regression function

SSR = sum of square of (y-hat - y-bar)

 

 

Degrees of Freedom (df)

Each sum of square has associated with it a given degree of freedom

SSTO - df = n - 1

SSR - df = 1

SSE - df = n - 2

Note:

SSTO = SSR + SSE

similarly

df of SSTO = df of SSR + df of SSE

n - 1 = 1 + (n - 2)

 

Mean Sum of Squares

The mean sum of squares are found by dividing the sum of squares by their corresponding degrees of freedom

Two mean sum of squares

REGRESSION MEAN SUM OF SQUARES (MSR)

ERROR MEAN SUM OF SQUARES (MSE)

MSR = SSR/df for SSR

MSE or s2 = SSE/df for SSE

MSE is the point estimator the variance of the error term

Lecture 36

 

 

Residual Analysis

Residual Analysis is used to evaluate whether or not the model that is fitted to the data is appropriate

Four issues are considered in evaluating whether or not the model is appropriate

  • Linearity - is the relationship between the two variables linear

  • Constant Variance - does the error term (ei) have constant variance at all levels of X

  • Normality - are the error terms normally distributed

  • Independence - are the error terms independent

Independence can be difficult to assess since it refers to the relationship between the error term and a third parameter (not X or Y). One common example is lack of independence with time

 

Residual Analysis

Two plots are used for residual analysis

  • Residual Plot - error vs fitted value

  • Normal Plot - actual error vs expected error for normality

Residual plot is used to evaluate linearity, constant variance, and independence (and to some extent normality)

Normal plot is used to evaluate normality :)

 

Normal Plot

Plot of actual vs expected residue under normality

Expected Residue under normality is given by

z(i-0.5/n) times (MSE)0.5

 

(standard normal value for the ith ranked item) times

(standard deviation for the error term)

if we have ten items, what is (i-0.5/n) for the second smallest?

what is z (i-0.5/n) for the second smallest?

What does z (i-0.5/n) give us?

Worksheet for calculating expected residue!

 

Class Work - Extra Credit (due Friday 12/4)

Use the vending machine data for the following

 

wpe3.jpg (10848 bytes)

b0 = 55.74 b1 = 0.0454

I. Plot the residual plot

II. Plot the normal plot

III. Is there evidence of non-linearity, non-constant variance, non-normality

IV Given that machine 1, 2, 3, and 6 are maintained by one contractor and machice 4 and 5 by a second. State whether or not there is evidence of non-independence for the error terms.

 

 

 

 

return to Lectures

Date of last update - 30 Nov 1998