March 26, 2010

Schlepping the Firewall Log – Part II

Posted in Uncategorized at 4:17 pm by dgcombs

I'm less afraid of bots today, but not by much. You know the kind bots that lurk in every network. According to some studies, five to seven percent of all machines in my company's network are infected with one kind of bot or another. If I count up all the people and add half as many servers there are some 800 to 1000 machines capable of supporting a bot. That means that somewhere between 40 and 70 bots live on the network just waiting to do something evil. In the first installment, I compared a few databases systems to see which one gave me the most flexibility on the poor quality hardware I have at hand. I chose MongoDB for a few sound reasons:

  • It offers interfaces in multiple languages such as Python and Java
  • It is flexible and permits a decided lack of schema
  • It is pretty fast, particularly when coupled with its indexing
  • and finally, it's one of the new, cool toys that all the kids are playing with on the Internet called NoSQL
I pulled a snapshot of firewall logs into a MongoDB database to do some queries. The first thing I found out was that the Python interface, although definitely nifty, added a layer of abstraction that also added a layer of confusion to my thinking. As a result, I decided to drop back and punt.

MongoDB has an interactive shell based on JavaScript. In fact, it is an adaptation of SpiderMonkey, the JavaScript component of FireFox, Yahoo! Widgets and even Adobe Reader, if you can believe WikiPedia. Using JavaScript to write the queries and the interface provided by the client mongo command, pulling out some interesting results became much easier. Shadowserver Foundations hosts a page on how to detect botnets. It says, "Look for the most commonly used default irc port: 6667. The full port range specified by the RFC: 6660-6669,7000. Also, since many IRC services can utilize ident, port 113 can also serve as a (less common) detection parameter."

A quick query using the MongoDB shell reveals

> db.fwlogs.find({service:6667}).count();
173

Yes, that does say there are 173 requests on default irc port 6667. So let's go back and look at these.

> somebots = db.fwlogs.find({service:6667},{dst:1});

shows the destination of all those 173 connections. Using the toArray() feature moves the result set to memory (be careful with how much memory the returned documents will consume). Then you can extract the distinct ones using a well known looping technique. Watch that you select only the dst because the _id portion will always be unique.

> mybots = somebots.toArray();

for(var i=0;i<l;i++){

        for(var j=i+1;j<l;j++) {

                if(mybots[i].dst === mybots[j].dst) j = ++i;

        }

        print(mybots[i].dst);

}

Now we can extract the meaningful destination addresses and find out if they're bot command and control sites or not. Uh-Oh. Spot checks on some of the IP addresses indicate they're mostly owned by US companies like Level 3, Comcast and TW Telecom. And one or two are based in Europe. And one stand out in Uruguay. It looks like it's time to change parameters and start finding those needles hidden in my network's haystack.

Posted via email from Meyeview (Posterous Style)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: