Splunking Email

by Steve Gailey 12. September 2013 17:14

Update


One of the things people have asked me about this topic is “why”, why Splunk your emails in the first place? I thought I’d share a fuzzy screenshot that shows a small dashboard I put together to allow me to look at about seven years of emails. I can do interesting but not particularly useful things like see at what time am I getting emails from people in New York, or from a particular group of people, and what time am I sending to those people.

Slightly more usefully I can see who emails me most and on what subject, and similarly who I am sending emails to and what subject. The most useful thing remains being able to find that very elusive email that you know you sent (or were sent). Splunk always was the tool to find that needle in a haystack and seven years worth of emails represents a pretty big haystack! Naturally you can drill-down into the detail you need in this dashboard.


Happy Splunking

 

It has seemed like a fairly good idea to me for a while to Splunk my email. How good would it be to be able to analyse who was sending you emails about audit or to immediately find that email you were sent last year about bonuses that had an excel attachment? Better yet, how about analysing who you are communicating with most and what about… Sounds hard to do but that is exactly what Splunk is good at. Some pretty big orgainsations are doing it with their entire company email feed and using the Splunk App for Microsoft exchange to make sense of it all, but I needed something a little more pedestrian. Now the tricky bit; how do you get all your personal emails into Splunk? Well I obsessively save emails so I have lots and lots of PST files full of emails that seemed important at the time. Outlook is pretty poor at processing these large files. Try having a few big ones open from a network share if you don’t believe me…

The trouble is that Splunk can’t interpret PST files. To be honest Microsoft struggles so what chance do the rest of us have? Well now that I don’t work seventeen hours a day I find myself at a loose end at the weekends so I thought I’d tackle this problem. I had already written an application to read PST files (see my previous post) and search them at high speed in memory so I had the beginnings of what I needed. I just needed to do the same sort of thing but to write the headers out to a CSV file for Splunk to process.

Having looked at my old PST files I realised that I had 9 of them totalling about 13GB of old data. So that answered the next question about requirements; I needed to process several files at once, which meant the program needed to be multi-threaded. Actually it all proved remarkably simple to put together and in one afternoon I had a working utility which allowed me to select any number of PST files and which then loaded and worked on them in parallel reading each message in turn and writing the headers out to a file. Fortunately my home built machine is a six core AMD box which this little application kept pretty busy when I ran it for the first time.

I wrote it to update the user on progress as sitting waiting for a file to appear is not the interactive experience I desire. Having created the 9 CSV files I loaded them into my home Splunk server (what do you mean you don’t have one) and in seconds they were indexed and I could start querying. I have to say that it was worth the small effort involved. I can very quickly search by subject, sender, recipient, CC, attachments or anything else that is in the header. I can easily graph out the trends and relationships but best of all I can easily find that elusive email I knew that I had received and having identified it I can go back to the original PST and extract the attachment or read the email. I would put wonderful Splunk screenshots up but I don’t really want you to see my emails, so I suggest you try it for yourself.

PSTHeaders.zip (3.49 mb) Here is the application to extract your PST headers. Let me know if you have any issues with it. Just unzip and run setup to install the application.

Happy Splunking.

 

Tags:

Splunk | Technical

Add comment

biuquote
  • Comment
  • Preview
Loading

Calendar

<<  December 2017  >>
MoTuWeThFrSaSu
27282930123
45678910
11121314151617
18192021222324
25262728293031
1234567

View posts in large calendar

Page List

RecentComments

Comment RSS