Send Close Add comments: (status displays here)
Got it!  This site "robinsnyder.com" uses cookies. You consent to this by clicking on "Got it!" or by continuing to use this website.  Note: This appears on each machine/browser from which this site is accessed.
Data science ideas: initial
by RS  admin@robinsnyder.com : 1024 x 640


1. Class project
A number of assignments will be for a individual or group project in data science.

You should find a problem for which you wish to apply data science techniques.

2. Scope
Whenever I have an idea for a identified project I start with the following. I usually end up starting at the minimum way to get started, then add requirements as needed.

YAGNI (You Ain't Gonna Need It).

3. Software systems development
This concept is related to the Pareto principle (which is the 80-20 rule).

4. Phases
This page has ideas for data science projects, investigations, etc.

Any remarks in square brackets are from projects on which I have worked.

5. Geographic information systems

6. Bayesian classification
Any yes-no question where data is available.

7. Topic modeling
Documents (Customers), Vocabulary (Products), Words used in documents (Products bought by a customer).

8. Intellectual property forensic analysis

9. Dimensionality reduction

10. Time series data
Time series analysis involves data that has periodic cycles.

11. Data collection, analysis, and display

12. Decision trees and random forests
Grouping data in tree structure from most important/prevalent to least important/prevalent.

13. Natural language processing

14. Text processing

15. Regression

16. Clustering
Clustering is used to partition a set of data into groups.

17. Gaussian mixtures
Gaussian mixture models are used to infer multiple (normal) distributions in aggregate data.

18. Statistical distributions

19. Kernel density estimation
Kernel density estimation

20. Neural networks
Neural networks are intended to recognize patterns in a yes-no manner. [best buy for computer given competing data]

21. Manifold learning
Manifold learning provides a way to do NLDR (Non Linear Dimensionality Reduction).

22. Support Vector Machines
A SVM (Support Vector Machine) is a mathematical way to do classification and regression.

23. Deep learning and tensor flow
Deep learning attempts to reduce the amount of "feature extraction" needed to analyze data.

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications. https://tensorflow.org

24. Data sources

25. End of page

by RS  admin@robinsnyder.com : 1024 x 640