Lecture 18-a

return to Lectures

 

 

Statistical Sampling

Statistical sampling give information about a POPULATION

Probability theory is used to

Select representative sample

Determine an appropriate sample size

Analysis the results for reliability

Census vs Sample

Census - information gathered for WHOLE population

Sample - information gathered for selected sub-set of population

 

Population

Population is total set of elements of interest for a given population

It is important to properly define the population - but the population of interest is not always obvious or easy to reach

What is the population for election polling?

Population maybe be FINITE or INFINITE

We have an infinite population when we are trying to study a PROCESS

Construction! Production! Weather!

 

Parameters and Statistics

Population Parameters are such characteristics as average age or variance of age FOR THE POPULATION

Sample statistics are such characteristics as average age or variance of age FOR THE SAMPLE

Are population parameters random variables?

Are sample statistics random variables?

Generally, population parameters are UNKNOWN, the sampling statistics are used to estimate the population parameters

 

Parameters and Statistics

Example: Population is UCONN undergrads - variable of interest is height of students

Population parameter - mean height of all Undergrads. This is a fixed but unknown value.

Sample statistic - mean height of students in a sample. This is a random variable (that varies depending on the sample selected). Used to estimate the population parameter.

 

Parameters and Statistics

Population Parameters

By convention, Greek letters used for population parameters

Population Mean, m

Population variance, s2 (divide by N)

Sample Statistics

Sample mean, X bar

Sample variance, s2 (divide by n-1)

 

Sampling Distribution

The Sampling Distribution is the probability distribution of a sample statistics for a given sample size

The sampling distribution is the basis for analyzing statistical sample results. It is used to answer questions such as

How much confidence should we place in our results?

How precise is our results?

Therefore, for sampling it is important to have some sense of the sampling distribution.

The an exact sampling distribution can be determined for discrete variables. For continuous variables the sampling distribution can be determined by experimentation.

 

Sampling Distribution a Discrete Variable

EX - A coin is tossed twice. The probability distribution for number of heads.

x           0        1        2

p(x) 0.25 0.50 0.25

Find the sampling distribution for the sample mean.

List all possible outcomes

Calculate the sample mean associated with each outcome

List probability for each outcome

xbar 0 0.5 1.0

p(x) 0.25 0.50 0.25

 

Sampling Distribution

What is the population mean for the coin toss problem?

What is the expected value of x-bar?

What is the relationship between population mean and x-bar?

Lecture 18-b

 

Sampling Distribution of the Sample Mean

Before sampling, the sample mean is a random variable with a sampling distribution

The sampling distribution is given by three separate theorems

i) The expected value of the sample mean is equal to the population mean

ii) The variance of the sample mean is equal to the population variance divided by n (the sample size)

iii) For almost all populations, the sampling distribution of the sample mean is approximately normal when the (simple random size) is sufficiently large. This theorem is know as the CENTRAL LIMIT THEOREM (CLT)

 

Central Limit Theorem

What is sufficiently large? - depends on the underlying population distribution (is it skewed or symmetrical)

In general, sample size larger than 50 is sufficient

Note: If the population is normally distributed then the sampling distribution of x-bar will also be normally distributed regardless of the the samples size

For finite populations, the three theorem applies if the sample fraction is 5% or less

 

Statistical Sampling Project
Estimating Average Age

Project Description

Each person in group should

select (using random numbers) two independent, random samples from the population. The first sample should contain ten people and the second twenty

calculate the sample mean and the sample variance for each sample (the variable of interest is age).

The population for this project are all employees at The Husky Bank

 

Project Objectives

Show how to do sampling using random numbers

Introduce the concept of population vs. sample

Show that sample parameters are random variables

Show that the sample mean has a population distribution

Show how the probability distribution of sample mean varies with sample size

 

return to Lectures

Date of last update - 16 Oct 1998