February 26, 2010

Schlepping the Firewall Log

Posted in Uncategorized at 4:29 pm by dgcombs

I’m scared of bots.

No not the kind that used to hang around with Will Robinson. Those guys are OK. They warn you of “Danger, Will Robinson!” I’m scared of the kind that lurk in hidden corners of someone’s PC  and suddenly spew out mounds of SPAM when you least expect it. These guys are bad guys. I’m not the only one scared either. Dark Reading reports that even Microsoft has been going after these guys. So I figured I’d help. I came up with a Plan!

Step 1. Find out if anyone inside my firewalls are infected with bots.

Hmmm… how do you do that? Well, according to at least one well respected article these bots will send out lots of traffic on IP Port 6667, trying to gain access Internet Relay Chat (IRC) channels for their orders. At a vendor seminar I recently attended, the speaker suggested that  Domain Named Services (DNS) and email (SMTP) ports 53 and 25 would be the tell tale signature of an infected bot. So I downloaded my firewall logs, converted them to ASCII format and tried to wade through them looking for markers. How tedious. Perhaps a database?

I have heard of Infobright. They designed a database which is basically write-only. The data is accessed on a columnar basis rather than a row-by-row basis. They claimed it was fast. Perhaps fast enough to allow me to set up a real-time search for a bot? So I created an Infobright database with my firewall logs. I decided to simplify my life and just get a count on all the DNS (Port 53) traffic identified by the firewall using Infobright’s straightforward MySQL query,

select count(service) from fwlogs where service=’53’;

For one month’s data it took 186 seconds. Not quite real time.

I decided to try this against “standard” search tools in UNIX/Linux command line. The following command took 246 seconds. Also not quite real time.

grep \|udp\|53 /usr/local/fw1-loggrabber/logs/* | wc -l

One of the people I follow on Twitter is Hilary Mason who tweets about all kinds  of interesting things. She tweeted one sunny afternoon:

Benchmarking grep+awk+sed vs MongoDB.

And if it’s good enough for Dr. Mason…

So I downloaded MongoDB and installed it. It’s different. They tell me it’s “a scalable, high-performance, open source, schema-free, document-oriented database.” Hmmmm… “high-performance” eh?

First I had to import my data. Mongo comes with a program called mongoimport which will import CSV (Comma Separated Values), TSV (Tab Separated Values) and JSON (JavaScript Object Notation) files. I tried it. It didn’t work. After digging down a bit, I realized my values were separated by PIPE symbols “|”. So I called on SED to help,

for i in $( ls /usr/local/fw1-loggrabber/logs ); do
sed s/\|/\,/g /usr/local/fw1-loggrabber/logs/$i > ./fwlogs/$i
Now I could import my data without problem! The search, using Python, took 49 seconds.
import pymongo
from pymongo import Connection
connection = Connection()
db = connection.fwdash
print collection.find( { “service” : 53 } ).count();

I think Mongo will be my new favorite database for all kinds of applications. Look out bots! You are in danger, Will Robinson!


  1. Emily said,

    Now I understand your postings…thanks for posting. Very interesting. Are there more ‘bots’ in pc-land or mac-land?

    • dgcombs said,

      from what I’ve seen in the research, the most bots are in PC-land. However, it’s becoming more prevalent on both Linux and Mac OS as well. Bot herders can potentially infect anything that will run a Javascript eval command.

  2. […] the original post: Schlepping the Firewall Log « Seeing things as they could be… Share and […]

  3. […] that somewhere between 40 and 70 bots live on the network just waiting to do something evil. In the first installment, I compared a few databases systems to see which one gave me the most flexibility on the poor […]

  4. […] the past several weeks, I’ve been migrating my Firewall Log & Security Metrics web application from Java Server Pages built on Caucho’s Resin web server to a new […]

  5. […] the amount of time spent waiting for data. This combination is the direct replacement for the JSP version I wrote several months ago. It is faster and leaner. Each link is associated with a get parameter […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: