April 3, 2010

A Slight Detour

Posted in Uncategorized at 10:11 pm by dgcombs

Most companies and service providers that are responsible for networks have an email address specifically to report abuse coming from their network. In fact, one of the specific fields on the request for a block of IP addresses is a point of contact for abuse. My employer is no different. Email comes to an email address which gets redirected to several people including me. One of us is supposed to act on the notification or complaint. These messages come from SpamCop, JunkMailFilter or similar organization. They notify us of anything from an unwanted piece of SPAM mail to a email borne virus to a pirated copy of a movie.

At one time, I suggested it might be beneficial to count all the messages we got daily and use that as a metric for how well we are executing against our Acceptable Use Policies. So I talked to the Exchange administrator and convinced him to allow just one more mailbox to collect messages sent to Abuse. It took some effort, but I promised to keep the mailbox clean and limit disk utilization. So he set up ABUSECOUNTER for me. Early last week, I remembered my promise and surveyed the mailbox, 11585 messages. It hadn’t been touched since March, 2008. It was time to leap into action!
Fortunately, one of my coworkers had previously requested that POP3 service be turned on. This is a function that allows mail to be downloaded by non-Microsoft clients like Eudora and PINE or even a Microsoft client like Outlook Express. It also makes scripting the download much more straightforward. Fortunately for me, my scripting language of choice these days, Python, has a ready-to-run POP3 interface. Rather than simply cleaning out the mailbox, I decided to keep a count of the number of messages received per day. That way, I could produce a nice report of email abuse to go along with my firewall logs.
I created another database on my MongoDB server called abusecounter and prepared to enter the data into a collection, or table, called abuses.

mongodb = pymongo.Connection(mongoserver)

abusedb = mongodb.abusecount

POP3 can retrieve the whole message for you in one fell swoop. However, playing with the commands, I found a much faster method. POP3 has a command called TOP which pulls down just the header section:

X-MimeOLE: Produced By Microsoft Exchange V6.5
Received:  from smtp1.blahblah.net ([555.12.12.12]) by xxx.yyy.zzz with Microsoft SMTPSVC(6.0.3790.3959); Mon, 31 Mar 2008 16:56:06 -0400
Received:  (qmail 16625 invoked from network); 31 Mar 2008 20:41:15 -0000
Return-Path: <techanalyst@blahblah.com.eg>
X-OriginalArrivalTime: 31 Mar 2008 20:56:07.0031 (UTC) FILETIME=[A3E48870:01C89371]
Subject: =?windows-1256?B?z+bRyePK3s/jyd3tIMfhys3h7eEgx+Hd5O0g4eHC0+Xj?=
Date: Mon, 31 Mar 2008 14:38:35 -0400
Message-ID: <3816-220083131183835180@Hssn>
From: “E-ADVERTISER” <techanalyst@BlahBlah.com.eg>
To: “Saudi” <techanalyst@BlahBlah.com>
Reply-To: “E-ADVERTISER” <techanalyst@BlahBlah.com>

That’s quite a chunk of information, and I’ve even removed much of it. TOP comes with a warning that it may not be properly implemented on all POP3 servers. However, on Exchange, it seems to work quite well.

The part we’re interested in is the line that starts with Date:. Date: Mon, 31 Mar 2008 14:38:35 -0400 tells us when the message was sent. In order to store the information in the MongoDB database, the abuses collection looks like this:

{
“count” : 9,
“date” : {
“day” : 31,
“month” : 3,
“year” : 2008
}
}

The count field tracks the number of messages we got on that day. The date field keeps the day, month and year the messages were sent. Simple. The Python script tore through the first 711 messages in almost no time at all. Then it choked on a message sent in February, 2010.

print(“Loading %s messages.”) % (messagecount)
for msg in range(messagecount):
messageheader = messages.top(msg+1,0) # get headers for message
for i in messageheader[1]:
if(re.search(“Date: “,i)):
print(i)
msgdate = time.strptime(i[:len(i)-6],”Date: %a, %d %b %Y %H:%M:%S”)
myDay = msgdate.tm_mday
myMon = msgdate.tm_mon
myYr = msgdate.tm_year
print “Inserting %s with date %s” % (msg,msgdate.tm_mday)
abusedb.abuses.update({“date.year”:msgdate.tm_year,”date.month”:msgdate.tm_mon,”date.day”:msgdate.tm_mday},{“$inc”:{“count”:1}},True)

The line if(re.search(“Date: “,i)): looks for the line in the header with Date: in it. When it finds this line, it hands it off to a built-in Python subroutine called strptime which helpfully finds the Day, Month and Year and converts them into integers. Unfortunately, strptime is very picky even down to the number of spaces it finds. So when the length of the Date: line changed, it quit. That part needs an update.
Speaking of update, the line that starts with abusedb.abuses.update includes this, {“$inc”:{“count”:1}}. That increments the count of emails received on that date by one. The True keyword tells the update that if it finds a matching record, to go ahead and increment the count. If it cannot find a matching record, it should add one and set count to 1. MongoDB has a pretty clever name for this, the upsert, a combination of update and insert in one fell swoop.
I’ve managed to clean the abusecounter mailbox, at least up until February, 2010. The IT manager is happier and I can now track the number of complaints we get by date.
Abuse Notifications

Abuse Notifications

Posted via email from Meyeview (Posterous Style)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: