Send
Close Add comments:
(status displays here)
Got it! This site "robinsnyder.com" uses cookies. You consent to this by clicking on "Got it!" or by continuing to use this website. Note: This appears on each machine/browser from which this site is accessed.
Data science and data
1. Data science and data
Getting good/usable data is the biggest obstacle to data science.
Analogy: American football
quarterback is the flashy data science methods
linemen are the foundation of good play - the data
Without good data, data science methods will not provide good results.
Old saying:
garbage in, garbage out
2. Litter
Saying:
put litter in its place
Of course, once it is in "its place", it is no longer litter.
3. OpenXML as data
Files in OpenXML can be the source of textual data for data science.
OpenXML is an open XML standard for representing textual data in the form of documents, spreadsheets, presentations, etc.
docx - documents
xlsx - spreadsheets
pptx - presentations
4. Raw OpenXML
5. Microsoft OpenXML SDK
6. OpenXML using Python
7. Images as data
8. Web pages as data
Web pages can be the source if data for data science.
9. End of page