Send
Close Add comments:
(status displays here)
Got it! This site "robinsnyder.com" uses cookies. You consent to this by clicking on "Got it!" or by continuing to use this website. Note: This appears on each machine/browser from which this site is accessed.
Data clustering
1. Data clustering
Clustering is an important technique for grouping data. Some examples include the following.
Customer/market segmentation
Graphics such as computer vision
2. Exact and approximate clustering
Clustering is based on some idea of what it means for two entities to be "
equal" or, in most cases, "
almost equal" in some sense.
We will look at the following.
Clustering and visualization in general
Exact clustering (when possible)
Approximate clustering
3. Human visualization
Humans have a unique ability to abstract and recognize patterns and make abstract inferences from those recognized patterns.
4. Abstraction
To
abstract is to
take away from the essentials and thereby to ignore certain differences.
The
similarity is what is the same. The
difference is what is different.
Human brains are built for complex abstraction.
The Latin word
"abstractus" ≈ "take away from". In abstract art, something is taken away, something remains, one needs to then interpret what is meant or intended.
Define abstraction and give a specific example.
5. Higher level intelligence
Abstraction is the key to higher level intelligence. That is why so many questions are of the form, "What is the primary similarity and difference between ...".
Much of computer science programming languages involve looking at patterns between text and making abstractions.
6. Triangles: Seeing and thinking
How many triangles do you see? There are no triangles! Your brain makes the triangles using abstraction (built into the brain).
Programming a computer involves a lot of abstraction of code text without thinking like a computer.
7. Abstraction
In simple terms,
abstraction is looking at similarities and ignoring differences.
Abstraction arises from a recognition of similarities between certain objects, situations, or processes in the real world, and the decision to concentrate on these similarities, and to ignore for the time being the differences. Tony Hoare (British computer scientist)
Dahl, O., Dijkstra, E., & Hoare, C. (1972).
Structured programming. New York: Academic Press., p. 83.
8. Programming abstractions
In programming terms, to abstract is to replace one or more parts of a program with a name that refers to the replaced parts (thus hiding the details). Here are some programming constructs that are used for abstraction.
constants and variables
procedures/functions with parameters
modules
objects and classes
... and many other concepts ...
9. Dimensions
Humans can easily visualize 2D or 3D in graphics but higher dimensions are harder to visualize.
In data science, one often learns concepts using examples in 2D or 3D and then generalize via abstraction to many more dimensions.
Working in 2D or 3D can thus help one understand the method that then generalizes to higher dimensions.
10. End of page