October 21, 2010

Rethinking my Firewall Logs

Posted in Uncategorized at 3:43 pm by dgcombs

In a resource constrained environment where log storage is at a premium, it might make sense to consider what you really need from your firewall logs. For example, the security engineer enabling the firewall logs in my current environment simply logged everything. I know why he did this. The primary reason was because IT operations was in the habit of blaming the firewall for everything. If "Sally" couldn't get to that web server on the DMZ, the first stop would be Security. And the first question would be "Do you see any dropped packets between Sally and the DMZ?" So Security played a very visible role in troubleshooting operational connectivity issues instead of monitoring the security posture of the network. Simply tracking every single stateful connection (i.e. Source/Source Port to Destination/Destination Port) results in a lot of log data. A lot of that data, while helpful to operational problem resolution, becomes a Security haystack when you're trying to figure out whether or not a botnet has entered your network.

Another problem is to determine how much log data you need to keep. Again, the operations side of the house would like to fill up that log disk to the very brim and only remove as much as you put in, keeping an "optimal" amount of data. Why? Just in case they need it for troubleshooting a problem. It's almost like asking a ten-year old boy how much ice cream they want. The answer is usually along the lines of "How much is there?"

A much more streamlined approach is to only track the destination connection information. By limiting the logs to Time/Destination/Service you've more than cut in half the amount of sheer data you're collecting. This alone makes any sorting and searching more responsive. I also don't think you need to keep more than ten days worth of data in an online search tool either. Ten days gives you more than a week's data for comparing Monday this week to Monday last week (well, up to Wednesday, anyway). Thirty days of log data just compounds the haystack. Now you're searching through a barn full of data for that needle.

Finally, I'm not at all sure that I need to track packets that get dropped at the firewall. I know the operations guys will have a fit. But this is a security application, not an IT debugger. If the firewall dropped the packet, it tripped over a rule. What I'm more concerned about are those packets that made it through the firewall, going in or coming out. Those are the packets that can cause a very real security problem.

I have been using a combination of FW1-Loggrabber and MongoImport to pull log data from my firewalls and stash it into a database. FW1-Loggrabber has been around for quite some time. The last update it saw was in 2005. However, it is the grandaddy of all Check Point log export tools. Since it is open source, it has been modified and customized to work with Splunk, Sawmill and Q1Labs QRadar to name just a few. There is even a SANS Whitepaper by Mark Stingley who customized it for pumping data and rule UID information into a MySQL database. Each of these has a central focus and several customizations which removed features not needed. The stock version of FW1-Loggrabber comes with its own set of limitations. Foremost among them is that the field selectors don't work any more.

The final version FW1-Loggrabber was written when Check Point had released the NG version of their firewall software. Since then, some things in the API have changed. A version R65 firewall no longer uses the same rule filter names to select records to be extracted. For example, if you select for "ACCEPT" records, FW1-Loggrabber gives you "DECRYPT" records. If you ask for "DROP" records, you get "MONITOR" records. And if you want "DECRYPT" you'll get "KEYINST" instead. At least "CTL" still gives you "CTL." But don't ask for "REJECT" or you'll be looking at "ENCRYPT."

So by carefully constructing a FW1-Loggrabber configuration file where

FW1_FILTER_RULE="action!=accept,ctl,drop,reject,decrypt"

 
and

FIELDS=time;orig;src;dst;xlatedst;service

I can minimize the data coming in to what is actually relevant and useful to me. This inverted selector returns only records dropped and accepted. Those packets dropped are not logged by the firewall and so never appear in the export. In the fields section, xlatedst is only used to determine if the packet is inbound for a server on the DMZ, it is not logged. All IP information is stored as a large integer to make sorting and indexing easier.

I decided to take it one more step. MongoDB uses Collections within Databases. So during the import phase, I pre-separate the data into inbound/outbound collections by firewall cluster. That way the amount of data in any collection is much smaller than when it was all crammed into one gigantic database and a huge logs collection. The data import takes approximately 60% less time than importing the full log data into that monolithic collection. The key portion of the Python script importing the data is

if (xlatedst): dst = ip2int(xlatedst)) else: dst = ip2int(dst)
  if (orig == FW1a):
    if (FW1aLo < src < FW1aHi):  # the source address is in the network behind FW1a
      fw1aout.insert({"time":logtime,"dst":dst,"svc":svc})
        else:  # the source address is the Internet
      fw1ain.insert({"time":mytime,"dst":dst,"svc":svc})

Now if an infected machine tries to connect to botnet command and control systems over port 53, usually reserved for DNS traffic, I can spot that a mile away. Well, at least from my web browser.

Leave a comment