Send
Close Add comments:
(status displays here)
Got it! This site "robinsnyder.com" uses cookies. You consent to this by clicking on "Got it!" or by continuing to use this website. Note: This appears on each machine/browser from which this site is accessed.
Summarizing book names and word count data
1. Summarizing book names and word count data
Let us look at data summarization using count data.
2. Representing and processing count data
3. Goals
The goals in this series include the following.
Look at ways of representing nominal, ordinal, and integer data.
Look at ways of loading the data.
Look at ways of summarizing the data.
Look at ways of visualizing the data.
Along the way, trade-offs and variations will be covered and/or mentioned, as there are many ways to do what is being done.
Directions on more involved data analysis will be mentioned as needed.
4. Data background
The data represents the number of times a letter appears in a corpus of English text. For the present purposes, the corpus details are not needed. A future topic will look at how to get summarized data such as this table from an actual corpus of text. A summary such as this has applications in many fields.
NLP (Natural Language Processing) (in text analysis, involving letters, bi-grams, tri-grams, etc.)
Security (in understanding letter substitution ciphers and code-breaking of those ciphers)
5. Data
Here is the data in a textual table form.
Letter |
Count |
a |
59900 |
b |
11302 |
c |
14311 |
d |
34483 |
e |
95247 |
f |
17852 |
g |
13528 |
h |
62509 |
i |
46347 |
j |
2321 |
k |
4449 |
l |
25939 |
m |
19154 |
n |
51311 |
o |
54741 |
p |
9918 |
q |
183 |
r |
37468 |
s |
44630 |
t |
72396 |
u |
20651 |
v |
7458 |
w |
18376 |
x |
376 |
y |
14814 |
z |
234 |
For the present purposes, we are not concerned about how this data was summarized from the (copies of) the original text.
Eventually, alternative language character sets will replace English text to show how non-English characters can be processed in various data analysis and visualization systems.
6. Notation
There are many notations for tables of data. Here is a quick summary.
Database |
table |
records |
fields |
Spreadsheet |
sheet |
rows |
columns |
... more to be added ...
7. End of page