Send
Close Add comments:
(status displays here)
Got it! This site "robinsnyder.com" uses cookies. You consent to this by clicking on "Got it!" or by continuing to use this website. Note: This appears on each machine/browser from which this site is accessed.
Distributions and sampling
1. Distributions and sampling
One meaning of the term
distribute is to
spread around as in
divide and share.
Some international relief programs distribute food to those who need it.
A
distribution system is a way or process of distributing something.
Many companies have distribution warehouses as part of their distribution system.
In statistics, a
distribution is the way in which data values are put into buckets in order to summarize the data in a meaningful way.
Distributions can be discrete (i.e., counts) or continuous (i.e., measures).
If there are enough values, the discrete counts can be approximated by a continuous measure.
2. Uses
Statistical distributions have many uses.
Fitting and analyzing sampled data with a distribution
Generating sample data for testing purposes (e.g., privacy considerations)
Generating test data can often involve creating sample text.
3. Chebyshev's theorem
Pafnuty Chebyshev (statistician) developed a theorem that says that, for any set of data values, the proportion of values that like within
k standard deviations of the mean is at least
1-1/k2, where
k is any constant greater than
1.
Russian name: Пафну́тий Чебышёв
4. Bell shape
What is a bell shape?
The Liberty Bell is an example of a bell-shaped object.
Many frequency distributions are bell-shaped, or can be approximated by a bell-shaped distribution.
5. Confidence intervals
For a symmetrical, bell-shaped, frequency distribution, the following are true.
About 68.0% of the observations will be within one standard deviation, σ, of the mean.
About 95.0% of the observations will be within two standard deviations of the mean.
About 99.7.0% of the observations will be within three standard deviations of the mean.
6. Six Sigma
Six Sigma is a quality control technique based on controlling variance measurements.
7. Central limit theorem
The central limit theorem is a measure of central tendency.
The central limit theorem states that as more and more random samples of a given size are taken from a population, the distribution of the sample means can be approximated by a normal distribution. The more samples, the better the approximation.
8. Simulation
Simulation techniques can be used to simulate many samples from a given distribution.
9. Normal distribution
What happens if samples are taken from a population that is normally distributed?
10. Sampling from a normal distribution
Here is the result of sampling from a normal distribution.
The distribution of the sample means appears to be a normally distributed.
This is to be expected.
Notice that the distribution of the sample means is narrower and taller. Since the height of all boxes (in this discrete approximation) is
1.0, the standard deviation of the sample means must be smaller than the standard deviation of the population.
11. Uniform distribution
Suppose that a real-world process is modeled by a uniform distribution in the range
30.0 to
60.0.
One example of a uniform distribution is a truly random number generation process within the range of interest, here
30.0 to
60.0.
Suppose that
1000 samples are to be taken where each sample consists of
16 values selected at random from the distribution and averaged to get a sample mean. Of course, there are random errors inherent in each sample of
16 values, but these random errors can be minimized by taking more samples, which is why
1000 samples are taken. Then create a bar chart of the distribution of the means of the
1000 samples. What is the distribution of the sampling process?
12. Sampling from uniform distribution
Here is the result of sampling from a uniform distribution.
The sample means appear to have a
bell-shaped distribution. This distribution is called the normal distribution, a measure of central tendency.
The
central limit theorem states that most sampling distributions can be approximated by a normal distribution, even if the population distribution (in this case, the uniform distribution) is not normally distributed. Thus, the central limit theorem has great importance since it means that the normal distribution has useful applications in practice.
What is the importance of the central limit theorem?
13. Exponential distribution
Consider the following exponential distribution.
What happens if samples are taken from this exponential distribution?
14. Sampling from exponential distribution
Here is a chart of the results of
2000 samples of size
16 from an exponential distribution.
The distribution of the sample means appears to be a normally distributed.
15. Discrete distribution
Consider the following discrete distribution.
What happens if samples are taken from this discrete distribution?
16. Sampling from a discrete distribution
Here is the chart of the results of
2000 samples of size
32 from the very non-normal discrete distribution.
The distribution of the sample means appears to be a normally distributed.
The standard deviation of the population would be quite large, as the values are only at the extremes (low and high) of the possible values.
Thus, the standard deviation of the sample means appears to be less than the standard deviation of the population.
Suppose that the middle values with
0.0 probability are omitted. An intuitive analogy at this point is to compare this distribution with the binomial distribution of a biased coin.
17. Central limit theorem
As can be seen, the central limit theorem is important in that it shows that normal distributions can be used to model the distribution of the sample means.
18. Statistical animations
Here are some animations of some statistical concepts that are relevant to a data science course.
19. Change in mean
As the mean changes, the distribution of the means keeps the same shape (i.e., the same standard deviation).
In this animation, the mean μ ranges from
40 to
60 while the standard deviation σ is
15.
20. Change in standard deviation
As the standard deviation σ (sigma) changes, the mean μ (mu) stays the same, but the shape of the curve changes. Since the area under the curve is always
1.0, the curve gets wider and flatter as the standard deviation increases and taller and narrower as the standard deviation decreases.
21. Normal curve shapes
The area under a probability curve is 1.0.
As the standard deviation σ (sigma) increases, the curve gets wider and shorter.
As the standard deviation σ (sigma) decreases, the curve gets narrower and taller.
As the mean μ (mu) increases, the curve shifts right.
As the mean μ (mu) decreases, the curve shifts left.
22. Poisson distribution
Here is an animation of how the Poisson distribution varies as mu is changed.
Notice that as μ(mu) increases, the Poisson distribution resembles the discrete binomial and the continuous normal distribution.
23. Chi-square distribution
Here is an animation to show how the χ
2 (chi-squared) distribution approaches the normal distribution as
n gets large.
24. End of page