Send Close Add comments: (status displays here)
Got it!  This site "robinsnyder.com" uses cookies. You consent to this by clicking on "Got it!" or by continuing to use this website.  Note: This appears on each machine/browser from which this site is accessed.
Data science: topics
by RS  admin@robinsnyder.com : 1024 x 640


1. Data science: topics
This page contains links to data science content - most using Python as the programming language.

This content was used in a data science course for computer science majors who may have not used Python before.

This content will not be updated again until I need to use it for another course - which might not happen.
For specific content on Python, see Data science: Python .
  • Python: Bar chart from text and CSV
  • Python: Bar chart from JSON
  • Python: Bar chart from XML
  • Data science: overview
  • Data science overview (for getting started with a project)
  • Data science ideas: initial
  • Data science steps
  • Set theory for data science
  • Coin flips
  • Data science: rolling dice
  • Decision trees (C code needs updated, R added)
  • Frequency distribution diagrams (background for A2)
  • Data science ideas: initial
  • Data science and data
  • Natural language processing
  • CAPTCHA overview
  • Data for data science projects
  • Word clouds using Python
  • Natural Language Parsing examples
  • Python simple statistics
  • Probability estimation
  • Matplotlib: charts
  • Longest common subsequence

  • 2. Visualization
  • Data science: Insight and prediction
  • Data categories
  • Graphics coordinate systems
  • Matplotlib: DPI tricks and plot class and rsPlot module

  • 3. Regression
  • Linear equations
  • Regression and correlation (we will finish this on Thursday)
  • Matplotlib: Linear regression
  • Simple linear relationships

  • 4. Visualization: plots, colors
  • Truth tables: programmed method (as a non-trivial example of the Python walrus operator)
  • Matplotlib: subplots and image formats
  • Matplotlib: Chart types
  • Matplotlib: Color models

  • 5. Visualization: nominal data
  • Summarizing data : The M&M Problem
  • Closeness: arithmetic and geometric progressions
  • Matplotlib: World population (to be finished next time)

  • 6. Visualization
  • Math: Exponents and logarithms
  • Distributions and sampling
  • Python: Normal and exponential distributions
  • Data and information visualization
  • Prediction from data

  • 7. Data scraping
  • Web scraping
  • XML: Extensible markup language
  • RSS: Really Simple Syndication
  • Python: Web scraping
  • Counting

  • 8. Clustering, k-means
  • Data clustering
  • Equivalence relations: math
  • Customer matching
  • Euclidean distances
  • Map distances
  • K-Means clustering
  • Python: Clustering

  • 9. Clustering: specific examples
  • Statistics: Odds ratio
  • Entropy function
  • Lorem Ipsum text

  • 10. Random sequences, suggestions, bar charts
  • Pseudo random sequence
  • Raspberry Pi: random number generator
  • Misleading charts
  • Auto-complete search terms
  • Bar charts: Interesting

  • 11. Decision trees and random forests
  • Bar charts: Interesting (example and setup for later discussion)
  • Data flow case study (updated 2020-04-15)
  • Graphviz: dot (non-Python install)
  • GraphViz: expression trees (non-Python usage)
  • The pitcher-batter problem (decision analysis against adversary, setup for decision trees)
  • Measurement and error
  • XBox Kinect for measuring bone length

  • 12. Zip data, decision analysis
  • Zip file compression
  • Matplotlib: 3D plots
  • Expected value: biased coin flips
  • Simple decision analysis (background, not covered in class)
  • Making decisions using information entropy

  • 13. Natural Language Processing
  • Distribution simulation
  • (the following are to get you thinking about data science models)
  • Models and reality
  • Murphy's Law
  • Regression to the mean

  • 14. Bayesian statistics, naive Bayes classification
  • Models of duality: computation
  • Conditional probability
  • Internet searches: Estimating probabilities
  • Predictive coding: precision and recall
  • Bayes Rule: Cancer testing
  • Computer literacy competency exams
  • Bayesian statistics and models (will pick up near the end of this page)

  • 15. Simpson's paradox, graphical models
  • Case study: color analysis
  • Influence diagrams
  • Graphical models
  • Simpson's paradox
  • Visualizing Simpson's Paradox

  • 16. Topic modeling: LDA, LSI
  • Topic modeling: overview
  • Latent Semantic Analysis
  • Topic models: Political news trends
  • Topic modeling: Conference proceedings
  • Recommendation engine starting point
  • Hadoop, Mahaut, MapReduce, etc.

  • 17. Gaussian mixture models
  • Topic model distributions
  • Hill climbing algorithms
  • Enron data
  • Data science: decision trees
  • Gini index
  • Python: decision trees
  • Inductive bias
  • Spam: filtering
  • Location text data conversion
  • Clustering example using cities
  • NLP: Translation issues
  • Python: Turtle graphics and rsTurtle module
  • Regression, neural networks, and fuzzy approximation
  • Deep learning
  • Hammer and nail

  • 18. F1: Final #1, final exam

    19. End of page

    by RS  admin@robinsnyder.com : 1024 x 640