Science Puts Enron E-Mail to Use
In March 2001, just a few months before Enron CEO Jeffrey Skilling resigned, an employee e-mailed him a joke about a policeman pulling over a speeding driver, whose wife subsequently rats him out to the cop for other offenses, including being drunk.
Skilling and Enron chairman Ken Lay, whose federal trial on multiple felony fraud charges starts Monday, might not see the irony that, like the driver's wife, their e-mails will soon be testifying against them, both in court and in public opinion.
Enron's inbox first hit the internet in March 2003 when the Federal Energy Regulatory Commission made public more than 1.5 million e-mails from 176 Enron employees as part of its investigation of the company's manipulation of California energy markets in 2000.
Journalists quickly scoured the e-mail for embarrassing moments and incriminating missives. Among the finds: Lay family members' thoughts about finding the perfect wedding photographer (someone who did one of the Kennedy's weddings), Enron executives angling for ambassadorships and positions in the Bush administration, instructions from Tom DeLay's staff to Lay and Skilling on how to handle $100,000 contributions and messages from Lay's secretary bemoaning the fact that she could not get tech support to fix Lay's phone, which would disconnect if answered before the third ring.
All this among countless jokes about Texas, sex, nuns, women, Latinos and priests. Other tasteful tidbits include an offensive booty-call contract and a fashion critique of government lawyers investigating Enron.
The e-mails drew the attention of more than just Californians looking for some payback for the rolling blackouts and astronomical energy bills. InBoxer, an antispam company, turned to the archive to help test its newest product, which scans company e-mails in real time for objectionable content or confidential information, according to CEO Roger Matus.
For an accurate test, Matus needed a sample of corporate e-mail in all its raw, unadulterated drama and glory. He was unsure of how useful the Enron e-mails would be, until he loaded the database and looked at the first message.
The e-mail read in whole: "So you were looking for a one-night stand, after all?"
"That was the moment I knew we had a good testing corpus," Matus said.
Of the 500,000 e-mails InBoxer included in the database, the company's algorithms identified 10,275 with offensive words and another 71,268 that included potentially inappropriate messages, such as sexual innuendos or lists of employee Social Security numbers.
"Enron had an extreme culture of people who worked hard and played hard," Matus said.
Company engineers also found some great jokes, including one about how to feed a pill to a cat, inspiring InBoxer to make the e-mails searchable inside a demo of the new product, called the Anti-Risk Appliance.
While searching through the e-mails for more on the Raptor subterfuge, visitors can also try to win Apple iPod shuffles given away to those who dig up the funniest joke, the most fireable e-mail, and the most regrettable message sent.
Commercial outfits aren't the only ones exploiting the Enron e-mail dump.
Academic researchers quickly realized the e-mails were a unique and open data trove that could be exploited by researchers interested in social networks and information analysis and retrieval.
The database soon came to be known as the Enron corpus (.pdf).
In 2004, professor Marti Hearst at the University of California at Berkeley School of Information Management & Systems tasked students in her natural-language-processing course with cleaning up the database to make it searchable.
"It is a way for students to see -- when they run text-classification algorithms on e-mail messages versus newsgroups -- how well those would do," Hearst said. "E-mail is one of the more difficult kinds of information to process."
While Hearst says the jury is still out on the usefulness of the Enron corpus for researchers, she argues that these kinds of shared corpuses are key to advancing computer science research rapidly, as they allow different algorithms to be compared.
Companies such as IBM or Microsoft have an advantage because researchers can test their ideas on internal or customer e-mails. Currently, no benchmark test exists for e-mail spools equivalent to the one developed by the National Institute of Standards and Technology for search engine algorithms.
After Hearst's class further narrowed and hand-tagged e-mails related to the California energy meltdown, Jeffrey Heer, a Berkeley doctoral student, used that subset to create a visualization engine that mapped out networks of information flow within the company. He discovered an anomaly in the e-mails about the California energy crisis.
"There was one guy, John Shelk, who was Enron's D.C. lobbyist who would send out group e-mails and reports on Washington affairs, such as what a proposal's effect would be on Enron," Heer said. "There were lots of back-and-forth conversations, but there was one person, Tim Belden, who was getting all these reports from Washington, but there was no backtalk. He looked like just some guy stuck on a mailing list."
Unsure whether Belden simply deleted all his e-mail or if he communicated by other channels, Heer Googled the name and found that Belden was an Enron trader now known as the mastermind of the California energy price-gouging scheme. He pleaded guilty in 2003 to conspiracy to commit wire fraud.
The biggest lesson Heer learned from looking at the e-mails is simply that employees should learn to use personal accounts for personal matters.
Skilling is charged with 35 counts of insider trading, conspiracy, fraud and lying to auditors; Lay faces seven counts of fraud and conspiracy. Both men have pleaded not guilty to all charges.
As for the ex-Enron executives who have been convicted in the scandal, perhaps the lesson learned is that business e-mail should be used for business purposes, not fraudulent accounting and felonious market manipulation.