The perils of event-log tracking

Over half a year ago I mentioned that I was going to write up a script that sucks down Windows security log info and deposits the summarized data into a database. That script has been pretty much done for several months now, and the data is finally getting some attention on campus. And with attention comes...

...feature requests.

It's good to have them, but tricky to deal with. There is a quirk in how these logs are gathered and how Microsoft records the data.
  • Login events record the User and the IP address the request came from. The machine they're logging into is the Domain Controller in this case.
  • Account Lockout events record the machine that performed the lockout. This would be things like their workstation, or the Outlook Web Access servers.
Since we use DHCP on campus, the Machine : IP association is not static. Our desktop support users like having machine name wherever possible, with machine name AND IP being preferred. This will prove tricky to provide. What doesn't help is that a lot of login events come from IPs that trace to our hardware load-balancers, which generally means a login via LDAP by way of our single-sign-on product (CAS). That's not a domained machine, obviously.

There are a couple of ways to solve this problem, but none of them are terribly good.
  • During event-parse, use WMI to query the IP address to find out what it thinks it is. Should work fast for domained machines, but will be horribly slow for undomained machines.
  • Use a reverse IP lookup. Except for the fact that we use BIND for our reverse-DNS records so we'll get a lot of useless dhcp134-bh.bh.wwu.edu style addresses that won't correspond to the 'machine login' events I'm already tracking.
  • Do DB queries to find out which machine logged in with that IP address most recently and use that. But all those lookups will HORRIBLY slow down parsing, even if I keep a lookup table during the parsing run.
Slowing down parsing is not a good thing, since we chew between 70K-300K events every 15 minutes during busy periods. Even small efficiency dings can be horribly magnified in such an environment.

Nope, looks like the best solution here is to make the IP address a clickable link that leads to another query that'll populate with the most recent machines to hold that IP address. If any show up at all.

I wonder what they'll ask for next?