Send
Close Add comments:
(status displays here)
Got it! This site "robinsnyder.com" uses cookies. You consent to this by clicking on "Got it!" or by continuing to use this website. Note: This appears on each machine/browser from which this site is accessed.
Measurement and error
1. Measurement and error
An important part of data science and statistics is measurement (via samples and data) and error.
2. Errors
An
error is a difference from what is expected and what is found to exist.
That is, an
error is something that is not correct, according to some definition of correctness.
3. Bugs and features
In a computer program, an error is often called a bug.
It is called a feature if you are responsible for the error but do not want to call it an error.
Some errors are more important than other errors.
4. Finding errors
There are many types of errors. Some errors are hard to find.
There is three errers in this sentence.
Find all of them.
5. Precision and accuracy
The goal is often to determine how close the sample value is to the corresponding actual population value.
Two concepts that are important in this measuring process are accuracy and precision.
Accuracy has to do with how close the results are to the actual value.
Something is accurate if it is true, according to some definition of truth.
Something is precise if it is very clear and to the point, without much variation.
Compare and contrast accuracy and precision. Give one specific example and use it to illustrate both concepts.
6. What is the accuracy/precision?
Precision is how well-defined (closely grouped) the results (experiments) are.
For each of the following, answer the following. (Note: Images do not show up on-line, just in-class)
Is it precise?
Is it accurate?
What are the possibilities?
7. Four possibilities
There are four possibilities.
precise and accurate
precise and not accurate
not precise and accurate
not precise and not accurate
8. Precise and/or accurate
9. Random and systematic errors
In terms of analyzing data, two important error concepts are random errors and systematic errors.
In a measuring process, errors can be categorized as either random errors or systematic errors.
A logical error is another type of error but is not addressed here.
10. Random errors
A
random error is due to a random fluctuations in the measuring process.
Thus, some measurements will be more and some less, but they should average out. Random errors can be minimized by larger sample sizes.
Cause: random fluctuations
Solution: take a larger sample
11. Systematic errors
A
systematic error is due to a systematic bias in the measuring process.
Taking more samples does not change the accuracy because there is a systematic bias. You need to carefully analyze for systematic errors
Cause: systematic bias in the measurement process
Solution: careful analysis of the measurement process
Will I get a more accurate measurement if I use a ruler to measure something many times?
Compare and contrast random errors and systematic errors in a measuring process.
12. Error examples
A tire company has made experimental measurements to predict the lifetime of their manufactured tires. What exactly is the relationship between manufactured tires and tire lifetime? Is it predictable? What errors are involved?
13. Tires
Assume that a tire company has made experimental measurements to predict the lifetime of their manufactured tires.
Categorize the following observations as being either random or systematic errors.
14. Wear and tear
Some people put more wear and tear on their tires.
Is this a random or a systematic error?
This is a random error.
Some people create more wear and tear than others.
15. Not using the warranty
Some people do not bother to return their tires even though they are under warranty.
Is this a random or a systematic error?
This is a systematic error.
There is not a corresponding group of people who return their tires when they are no longer under warranty.
16. Some tires last longer
Some tires last longer than other tires.
Is this a random or a systematic error?
This is a random error.
A larger sample will cancel out this effect (i.e., the Central Limit Theorem).
17. Limited warranty
Some warranties are limited to the original buyer of the tire.
Aside: What is the difference between a warranty and a guarantee? How did we get both words? Hint: It is related to words such as ward and guard, war and guerrilla (not gorilla), etc.
Is this a random or a systematic error?
This is a systematic error.
This policy systematically eliminates an entire segment of possible tire returns under warranty.
18. Coronavirus deaths
In Spring 2020, the Coronavirus deaths in the United States were (as with other flu seasons in the past) counted as follows.
For any patient who has coronavirus, the cause of death is attributed to coronavirus regardless of the underlying cause (of death).
What is the type of measurement error that best represents this way of counting deaths due to Coronavirus?
19. Signal and noise
Nate Silver wrote the important book "The Signal and the Noise: Why So Many Predictions Fail - but Some Don't".
An important part of data science and statistics is extracting a useful signal from the signal and the noise.
20. Claude Shannon
Claude Shannon's groundbreaking work on information theory, including information entropy, had to do with data signals and data noise.
Noise is anything that distorts the information part of a signal from being transmitted clearly.
What is a common example of noise distorting a signal and how can it be eliminated?
21. Side conversations
Example: Side conversations in class create noise that is distracting to students trying to listen to the professor. It also distracts the students having the side conversation from paying full attention to the professor.
How might such noise be eliminated?
22. End of page