Lecture 18-a
return to Lectures
![]()
Statistical Sampling
Statistical sampling give information about a POPULATION
Probability theory is used to
Select representative sample
Determine an appropriate sample size
Analysis the results for reliability
Census vs Sample
Census - information gathered for WHOLE population
Sample - information gathered for selected sub-set of population
Population
Population is total set of elements of interest for a given population
It is important to properly define the population - but the population of interest is not always obvious or easy to reach
What is the population for election polling?
Population maybe be FINITE or INFINITE
We have an infinite population when we are trying to study a PROCESS
Construction! Production! Weather!
Parameters and Statistics
Population Parameters are such characteristics as average age or variance of age FOR THE POPULATION
Sample statistics are such characteristics as average age or variance of age FOR THE SAMPLE
Are population parameters random variables?
Are sample statistics random variables?
Generally, population parameters are UNKNOWN, the sampling statistics are used to estimate the population parameters
Parameters and Statistics
Example: Population is UCONN undergrads - variable of interest is height of students
Population parameter - mean height of all Undergrads. This is a fixed but unknown value.
Sample statistic - mean height of students in a sample. This is a random variable (that varies depending on the sample selected). Used to estimate the population parameter.
Parameters and Statistics
Population Parameters
By convention, Greek letters used for population parameters
Population Mean,
mPopulation variance,
s2 (divide by N)Sample Statistics
Sample mean, X bar
Sample variance, s
2 (divide by n-1)
Sampling Distribution
The Sampling Distribution is the probability distribution of a sample statistics for a given sample size
The sampling distribution is the basis for analyzing statistical sample results. It is used to answer questions such as
How much confidence should we place in our results?
How precise is our results?
Therefore, for sampling it is important to have some sense of the sampling distribution.
The an exact sampling distribution can be determined for discrete variables. For continuous variables the sampling distribution can be determined by experimentation.
Sampling Distribution a Discrete Variable
EX - A coin is tossed twice. The probability distribution for number of heads.
x 0 1 2
p(x) 0.25 0.50 0.25
Find the sampling distribution for the sample mean.
List all possible outcomes
Calculate the sample mean associated with each outcome
List probability for each outcome
xbar 0 0.5 1.0
p(x) 0.25 0.50 0.25
Sampling Distribution
What is the population mean for the coin toss problem?
What is the expected value of x-bar?
What is the relationship between population mean and x-bar?
Lecture 18-b
Sampling Distribution of the Sample Mean
Before sampling, the sample mean is a random variable with a sampling distribution
The sampling distribution is given by three separate theorems
i) The expected value of the sample mean is equal to the population mean
ii) The variance of the sample mean is equal to the population variance divided by n (the sample size)
iii) For almost all populations, the sampling distribution of the sample mean is approximately normal when the (simple random size) is sufficiently large. This theorem is know as the CENTRAL LIMIT THEOREM (CLT)
Central Limit Theorem
What is sufficiently large? - depends on the underlying population distribution (is it skewed or symmetrical)
In general, sample size larger than 50 is sufficient
Note: If the population is normally distributed then the sampling distribution of x-bar will also be normally distributed regardless of the the samples size
For finite populations, the three theorem applies if the sample fraction is 5% or less
Statistical Sampling Project
Estimating Average Age
Project Description
Each person in group should
select (using random numbers) two independent, random samples from the population. The first sample should contain ten people and the second twenty
calculate the sample mean and the sample variance for each sample (the variable of interest is age).
The population for this project are all employees at The Husky Bank
Project Objectives
Show how to do sampling using random numbers
Introduce the concept of population vs. sample
Show that sample parameters are random variables
Show that the sample mean has a population distribution
Show how the probability distribution of sample mean varies with sample size
return to Lectures
Date of last update - 16 Oct 1998