Virus Alert!

The Internet Storm Center had a nice post this morning.

It seems, malware authors have taken to dynamically generating binaries for each and every client that contacts them. This understandably makes traditional virus-checkers useless, which is the whole point. This is another step in the game of cat-and-mouse that has been going on in the Microsoft OS virus game, since it started w-a-y back in the MS-DOS days. I had a computer infected with "ping-pong" once.

In the beginning, a virus checker was simply a regex engine capable of efficiently parsing binary files wrapped around a static list of regexes associated with known viruses. Cleaning files involved a bit more logic, but for non-destructive bugs it wasn't too hard. This worked for some time, as all known viruses had to infect files to propagate themselves.

Then the hackers figured out how to infect boot sectors, and the boot-sector virus was born. Anti-virus engines had to evolve to provide the same regex support on the boot sector. So now AV engines had to check files being launched and an actual disk structure. As before, this worked for some time.

In this early day of virus, it could take months for a new virus to get out there enough for detection. Anti-virus updates were sent out as needed, and eventually came monthly. For quite some time, McAfee sent out a static .exe file (Scan.exe) with the detection database hard-coded into it. So you saw .ZIP archives on BBS download sites with the shareware,, etc.

Then the virus writers figured out how to do "polymorphic" code. This was an advance, as the regex-based AV engines couldn't detect code that CHANGED from infection to infection. In response AV engines had to have much greater intelligence built into them. There was some back-and-forth for a while with code that really wasn't as regex-proof as the virus writers thought, for instance certain virus writing toolkits left their own signatures that could be detected.

Then the virus writers figured out how to infect the BIOS of computers. CIH is the classic example. This is an approach that didn't get wide-spread use since BIOS is a hardware dependant activity. Never the less, this caused antivirus vendors to modify their products to be able to clean the damage caused by this kind of bug.

Then came the era of application-viruses. MS Office came with a very featured macro language that was exploited to send bad code around the world. Yet another AV update to handle this kind of problem.

Extending from the Macro viruses came the first mass-mailers. While AV on mail gateways had been around for a while thanks to existing Macro viruses, the mass-mailers make it amply clear to everyone that such protection was a requirement. They also provided a nice scale test for those solutions.

And that's about when things stopped being just-for-the-mayhem and moved to profit-driven. You don't see truly destructive viruses anymore. If you've subverted a workstation, you may as well do things to it that'll earn you money, like install a bot-client.

And now we have bots being installed with custom compiled code! We no longer have "viruses" we now have malware, and spyware. The old regex-engine based AV methods are still somewhat viable for older threats, but the future is solidly into behavior-based detection. Spam spewers can come in many, many shapes, sizes, and colors. This sort of heuristic detection is a lot harder to code than fancy regex. This sort of heuristic detection is also a lot harder to make "false-positive proof".

Case in point. You purchase a software package to help you put together Newsletters for your quilt-shop. Once a month you send out 430 identical emails to your list. Heuristic scanners can see this behavior and start throwing alerts.

End users HATE false-positives, and unwarranted fear. This provides a disincentive to make aggressive heuristic scanners, and instead rely upon detection databases.

One false-positive that has annoyed NetWare engineers for years is McAfee's hate of SERVER.EXE that causes service-pack installs to bomb. Scanning this file specifically, rather than marking it bad based on name and location, would show that this is a file that can't run in Win32 protected mode and is thus no threat. This is a case of an incomplete heuristic. Not scanning the file, though, does provide some speed bonuses.

Gateway scanners like those for email are largely stuck with database-driven detection methods. Once processing power increases faster than email volume (not likely) you may be able to see in depth analysis of what a file COULD do once it gets to where it is going, but I don't expect to see systems like that for a number of years. End-point scanners like those on workstations have a much richer feature set to play with, and heuristic scanning will work there much better. We just need product.