Send Close Add comments: (status displays here)
Got it!  This site "robinsnyder.com" uses cookies. You consent to this by clicking on "Got it!" or by continuing to use this website.  Note: This appears on each machine/browser from which this site is accessed.
Enron data
by RS  admin@robinsnyder.com : 1024 x 640


1. Enron data
The Enron energy company scandal was in the news in 2001.

After the event, the data, in particular the email data (except for some personal email messages) was made public. That data is often used to test new ways of sifting through large amounts of realistic email looking for certain patterns.

2. Background
The Enron data set has 125,000+ email messages amounting to about 260+ MB of text. I have parsed them and put them into a form for processing.

Using LSI/LDA results in out of memory errors, so I have started working with 64 bit processing rather than 32 bit processing. I have worked with the large Enron data set (120,000+ documents, 90,000+ words) using multiple cores on the same computer to speed up the computations.

This data can be used for experiments for Predictive Coding research. The initial topic modeling with a few hundred records looked promising.

3. Topic modeling
If the documents are email messages, as in a legal proceeding, then the topics may help investigate emails that are relevant to the case. I, and others, have looked at this issue using the freely available Enron emails that were part of the Enron court proceedings.

4. Enron email message traffic
Here is a chart of the Enron email message traffic over time. EnronThe field of security uses traffic analysis, including who sent who messages and when, etc., to identify interesting aspects of what is happening.

5. Enron stock price
Here is a chart of the Enron stock price over time. Enron

by RS  admin@robinsnyder.com : 1024 x 640