Recently in spam Category

It's the little things

| 1 Comment | No TrackBacks
My attention was drawn to something yesterday that I just hadn't registered before. Perhaps because I see it so often I didn't twig to it being special in just that place.

Here are the Received: headers of a bugzilla message I got yesterday. It's just a sample. I've bolded the header names for readability:
Received: from ExchEdge2.cms.wwu.edu (140.160.248.208) by ExchHubCA1.univ.dir.wwu.edu (140.160.248.102) with Microsoft SMTP Server (TLS) id 8.1.393.1; Tue, 15 Sep 2009 13:58:10 -0700
Received: from mail97-va3-R.bigfish.com (216.32.180.112) by
ExchEdge2.cms.wwu.edu (140.160.248.208) with Microsoft SMTP Server (TLS) id 8.1.393.1; Tue, 15 Sep 2009 13:58:09 -0700
Received: from mail97-va3 (localhost.localdomain [127.0.0.1]) by mail97-va3-R.bigfish.com (Postfix) with ESMTP id 6EFC9AA0138 for me; Tue, 15 Sep 2009 20:58:09 +0000 (UTC)
Received: by mail97-va3 (MessageSwitch) id 12530482889694_15241; Tue, 15 Sep 2009 20:58:08 +0000 (UCT)
Received: from monroe.provo.novell.com (monroe.provo.novell.com [137.65.250.171]) by mail97-va3.bigfish.com (Postfix) with ESMTP id 5F7101A58056 for me; Tue, 15 Sep 2009 20:58:07 +0000 (UTC)
Received: from soval.provo.novell.com ([137.65.250.5]) by
monroe.provo.novell.com with ESMTP; Tue, 15 Sep 2009 14:57:58 -0600
Received: from bugzilla.novell.com (localhost [127.0.0.1]) by soval.provo.novell.com (Postfix) with ESMTP id A56EECC7CE for me; Tue, 15 Sep 2009 14:57:58 -0600 (MDT)
For those who haven't read these kinds of headers before, read from the bottom up. The mail flow is:
  1. Originating server was Bugzilla.novell.com, which mailed to...
  2. soval.provo.novell.com running Postfix, who forwarded it on to Novell's outbound mailer...
  3. monroe.provo.novell.com, who attempted to send to us and sent to the server listed in our MX record...
  4. mail97-va3.bigfish.com running Postfix, who forwarded it on to another mailer on the same machine...
  5. mail97-ca3-r running something called MessageSwitch, who sent it on to the internal server we set up...
  6. exchedge2.cms.wwu.edu running Exchange 2007, who send it on to the Client Access Server...
  7. exchhubca1.univ.dir.wwu.edu for 'terminal delivery'. Actually it went on to one of the Mailbox servers, but that doesn't leave a record in the SMTP headers.
Why is this unusual? Because steps 4 and 5 are at Microsoft's Hosted ForeFront mail security service. The perceptive will notice that step 4 indicates that the server is running Postfix.

Postfix. On a Microsoft server. Hur hur hur.

Keep in mind that Microsoft purchased the ForeFront product line lock stock and barrel. If that company had been using non-MS products as part of their primary message flow, then Microsoft probably kept that up. Next versions just might move to more explicitly MS-branded servers. Or not, you never know. Microsoft has been making placating notes towards Open Source lately. They may keep it.
Remember this from a month ago? As threatened in that post I did go ahead and call Microsoft. To my great pleasure, they were able to reproduce this problem on their side. I've been getting periodic updates from them as they work through the problem. I went through a few cycles of this during the month:

MS Tech: Ahah! We have found the correct regex recipe. This is what it is.
Me: Let's try it out shall we?
MS Tech: Absolutely! Do you mind if we open up an Easy Assist session?
Me: Sure. [does so. Opens sends a few messages through, finds an edge case that the supplied regex doesn't handle]. Looks like we're not there yet in this edge case.
MS Tech: Indeed. Let me try some more things out in the lab and get back to you.

They've finally come up with a set of rules to match this text definition: "Match any X-SpamScore header with a signed integer value between 15 and 30".

Reading the KB article on this you'd think these ORed patterns would match:
^1(5|6|7|8|9)$
^2\d$
^30$
But you'd be wrong. The rule that actually works is:
(^1(5$|6$|7$|8$|9$))|(^2(\d$))|(^3(0$))
Except if ^-
Yes, that 'except if' is actually needed, even though the first rule should never match a negative value. You really need to have the $ inside the parens for the first statement, or it doesn't match right; this won't work: ^1(5|6|7|8|9)$. The same goes for the second statement with the \d$ constructor. The last statement doesn't need the 0$ in parens, but is there to match the pattern of the previous two statements of having the $ in the paren.

Riiiiiight.

In the end, regexes in Exchange 2007 Transport Rules are still broken, but they can be made to work if you pound on them enough. We will not be using them because they are broken, and when Microsoft gets around to fixing them the hack-ass recipes we cook up will probably break at that time as well. A simple value list is what we're using right now, and it works well for 16-30. It doesn't scale as well for 31+, but there does seem to be a ceiling on what X-SpamScore can be set to.
Exchange 2007 supports a limited set of Regular Expressions in its transport-rules. The Microsoft technet page describing them is here. Unfortunately, I believe I've stumbled into a bug. We've recently migrated our AntiSpam to ForeFront. And part what ForeFront does is header markup. There is a Spamminess number in the header:
X-SpamScore: 66
That ranges from deeply negative to over a hundred. With this we can structure transport-rules to handle spammy email. In theory, the following trio of regexes should catch anything with a score of 15 or higher:
^1(5|6|7|8|9)$
^(2|3|4|5|6|7|8|9)\d$
^\d\d\d$
Those of you that speak Unix regex are quirking an eyebrow at that, I know. Like I said, Microsoft didn't do the full Unix regex treatment. The "\d" flag, "matches any single numeric digit." The parenthetical portion, "Parentheses act as grouping delimiters," and, "The pipe ( | ) character performs an OR function."

Unfortunately, for reasons that do not match the documentation the above trio of regexes is returning true on this:
X-SpamScore: 5
It's the second recipe that's doing it, and it looks to be the combination of paren and \d that's the problem. For instance, the following rule:
^\d(6|7)$
returns true for any single numeric value, but returns false for "56". Where this rule:
^5(6|7)
only returns true for 56 and 57. To me this says there is some kind of interaction going on between the \d and the () constructors that's causing it to change behavior. I'll be calling Microsoft to see if this is working as designed and just not documented correctly, or a true bug.

Email reputation

| No Comments | No TrackBacks
One of the hot new things in anti-spam technology is something that's rather old. Yes, the Realtime Blackhole List is back. Only these RBL's aren't the old school DNS servers of yesteryear, these RBLs are maintained by the big anti-spam vendors and are completely proprietary. The new name is now, "IP Reputation," and that's showing up on the marketing glossies.

The idea is that you deploy a network of sensors (say, every anti-spam appliance you ship, or software-package installed) that relay spam/ham information back to home base. Home base then builds a profile of the behaviors of the incoming IP connections. Once certain completely proprietary threshold are crossed, the anti-spam vendor then publishes that particular IP addresses reputation to their service. The installed base then queries the reputation service on every incoming TCP connection to see how to handle that connection.

The response varies from vendor to vendor, but include:
  • Outright blocking. Do not accept traffic from this IP address. The connection is terminated before any SMTP commands can be issued. Do not pass EHLO. Do not collect 220-ESMTP.
  • Deferr. Issue a 421 error message. Smart mailers will attempt redelivery later. Bots are generally too stupid to try this and just pass on to the next address on their list.
  • Throttle. Get very slow in accepting mail. Take a long time to issue 250-Ready statuses after SMTP commands.
The nice thing about IP reputation is that it is fast and cheap. Instead of having to lexically scan every incoming email for spamminess, you can just look at the source's reputation and block a very large percentage of messages. When we turned this on for our spam product a while back, the reputation filter blocked between 90% to 95% of all messages ultimately blocked as spam. Clean email is the single most expensive mail to pass since it has to go through every single stage of the spam/ham test pipeline, and blocking things earlier in the pipeline is a good way to shed load.

Not all optimizations are without side effects, and this one wasn't. The former student email server, titan, got itself 'greylisted' due to spam quantities. Around 50% of the message traffic into Exchange from this system was ultimately blocked as Spam according to the old anti-spam appliances we had (we'd routed its mail through the 'outbound' queue on those appliances so it wouldn't be subject to reputation tests, but would still scan email). As part of the migration of student email to OutlookLive.Edu, we set up forwards from the old cc.wwu.edu addresses to the new addresses. The spam-checkers on titan were of poor enough quality that enough spam got through to cause OutlookLive to start grey-listing Titan, causing mail to really back up on it.

That's not the only thing. Certain mailers managed by departments other that ITS here at WWU have managed to get themselves greylisted or outright blacklisted on these proprietary reputation lists. The one common denominator we've found is that certain specific UNIXy mailers do not apply their anti-spam processes to mail that is subjected to a .forward. At least, not without specific config telling it to scan that traffic. So if a person on one of these mailers has a .forward sending all mail into Exchange, the full spam-filled feed heads to Exchange and the reptuation of that mailer gets dinged.

Which is a long way of saying that, ahem:

In this era of IP reputation, outbound spam filtering is now just as required as inbound.

Really. Go do it. It'll help prevent blacklistings, and that sucks for anyone subjected to it.

ForeFront and spam

| No Comments | No TrackBacks
They have an option to set a custom X-header for indicating spam. The other options are subject-line markup and quarantine on the ForeFront servers. What they never document is what they set the header to. As it happens, if the message is spam it gets set like this:
X-WWU-JunkIt: This message appears to be spam.
Very basic. And not documented. Now that we know what it looks like we can create a Transport Rule that'll direct such mail to the junk folder in Outlook. Handy!
Yesterday we got some concerned mails from the one of the groups who sends mail by way of one of our web-servers. It's a somewhat critical function they do, so we paid attention to it. It seems they were getting bounce-messages from comcast.net. The bounce said that the incoming IP address did not have a reverse lookup (PTR record) and they don't talk to people like that.

This was confusing. Because we really do have a PTR record for that particular mailer. And yet, getting bounces. So one of the Webdevs calls Comcast to ask politely what the heck, and the Comcast support person walks them through a series of steps to demonstrate what went wrong. According to them, or so implied the webdev who doesn't speak SMTP as well as we do, the problem was that 'wwu.edu' does not resolve to an IP address.

There are reasons we haven't done this, and they have to do with mail delivery. Certain stupid mailers will deliver to a resoveable host before searching MX records, and if "wwu.edu" is resoveable, it'll attempt delivery to THAT instead of where it should. The server that runs 'www.wwu.edu' is the one that we'd have to point 'wwu.edu' to, and it is not a mail host. Far from. This seemed to be a strange requirement of Comcast.

I cracked it earlier today. You see, if you take a look at the NameServer records for the "wwu.edu" domain you will find three records.

140.160.242.13
140.160.240.12
216.186.4.245

It's that last one that's the problem. For some reason, our offsite DNS didn't have that particular reverse-lookup domain replicated to it. So if Comcast used it for resolving the incoming IP, it would get 'UNKNOWN' and block the connection. If they picked one of the other two, it would resolve and delivery would continue. Tada! The Comcast error message really was true, we just didn't realize one of our DNS servers didn't have all the data it needed. Oops.

Explaining email

| 1 Comment | No TrackBacks
In the wake of this I've done a lot of diagramming of email flow to various decision makers. Once upon a time, when the internet was more trusting, SMTP was a simple protocol and diagramming was very simple. Mail goes from originator, through one or more message transfer agents (or none at all and deliver directly, it was a more trusting time back then) to its final home, where it gets delivered. SMTP was designed in an era when UUCP was still in widespread use.

Umpteen years later and you can't operate a receiving mail server without some kind of anti-spam product. Also, anti-spam has been around as an industry long enough that certain metaphors are now ingrained in the user experience. Almost all products have a way to review the "junk e-mail". Almost all products just silently drop virus-laden mail without a way to 'review' it. Most products contain the ability to whitelist and blacklist things, and some even let that happen at the user level.

What this has done is made the mail-flow diagram bloody complicated. As a lot of companies are realizing the need to have egress filters in addition to the now-standard ingress filters, mail can get mysteriously blocked at nearly every step of the mail handling process. The mail-flow diagram now looks like a flow-chart since actual decisions beyond "how do I get mail to domain.com" are made at many steps of the process.

The flow-charts are bad enough, but they can be explained comprehensibly if given enough time to describe the steps. What takes even more time is when deploying a new anti-spam product and deciding how much end-user control to add (do we allow user-level white lists? Access to the spam-quarantine?), what kinds of workflow changes will happen at the helpdesk (do we allow helpdesk people access to other people's spam-quarantine? Can HD people modify the system whitelist?), or overall architectural concerns (is there a native plugin that allows access to quarantine/whitelist without having to log in to a web-page? If the service is outsourced (postini, forefront) does it provide a SSO solution or is this going to be another UID/PW to manage?).

And I haven't even gotten to any kind of email archiving.
As mentioned in the Western Front, we're finally migrating students to the new hosted Exchange system Microsoft runs. They've since changed the name from Exchange Labs to OutlookLive. It has taken us about two quarters longer than we intended to start the migration process, but it is finally under way.

Unfortunately for us, we got hit with a problem related to conflicting mail priorities. But first, a bit of background.

ATUS was getting a lot of complaints from students that the current email system (sendmail, with SquirrelMail) was getting snowed under with spam. The open-source tools we used for filtering out spam were not nearly as effective as the very expensive software in front of the Faculty/Staff Exchange system. Or much more importantly, were vastly less effective than the experience Gmail and Hotmail give. Something had to change.

That choice was either to pay between $20K and $50K for an anti-spam system that actually worked, or outsource our email for free to either Google or Microsoft. $20K.... or free. The choice was dead simple. Long story short, we picked Microsoft's offering.

Then came the problem of managing the migration. That took its own time, as the Microsoft service wasn't quite ready for the .EDU regulatory environment. We ran into FERPA related problems that required us to get legal opinions from our own staff and the Registrar relating to what constitutes published information, which required us to design systems to accommodate that. Microsoft's stuff didn't make that easy. Since then, they've rolled out new controls that ease this. Plus, as the article mentioned, we had to engineer the migration process itself.

Now we're migrating users! But there was another curveball we didn't see, but should have. The server that student email was on has been WWU's smart-host for a very long time. It also had the previously mentioned crappy anti-spam. Being the smart-host, it was the server that all of our internal mail blasts (such as campus notifications of the type Virginia Tech taught us to be aware of) relayed through. These mail blasts are deemed critical, so this smart-host was put onto the OutlookLive safe-senders list.

Did I mention that we're forwarding all mail sent to the old .cc.wwu.edu address to the new students.wwu.edu address? The perceptive just figured it out. Once a student is migrated, the spam stream heading for their now old cc.wwu.edu address gets forwarded on to OutlookLive by way of a server that bypasses the spam checker. Some students are now dealing with hundreds of spam messages in their inbox a day.

The obvious fix is to take the old mail server off of the bypass list. This can't be done because right now critical emails are being sent via the old mail server that have to deliver. The next obvious fix, turn off forwarding for students that request it, won't work either since the ERP system has all the old cc.wwu.edu addresses hard-coded in right now and the forwards are how messages from said system get to the students.wwu.edu addresses. So we geeks are now trying to set up a brand new smart-host, and are in the process of finding all the stuff that was relaying through the old server and attempting to change settings to relay through the new smart-host.

Some of these settings require service restarts of critical systems, such as Blackboard, that we don't normally do during the middle of a quarter. Some are dead simple, such as changing a single .ini entry. Still others require our developers to compile new code with the new address built in, and publish the updated code to the production web servers.

Of course, the primary sysadmin for the old mail-server was called for Federal jury-duty last week and has been in Seattle all this time. I think he comes back Monday. His grep-fu is strong enough to tell us what all relays through the old server. I don't have a login on that server so I can't try it out myself.

Changing smart-hosts is a lot of work. Once we get the key systems working through the new smart-host (Exchange 2007, as it happens), we can tell Microsoft to de-list the old mail-server from the bypass list. This hopefully will cut down the spam flow to the students to only one or two a day at most. And it will allow us to do our own authorized spamming of students through a channel that doesn't include a spam checker. Valuable!

Death of cc.wwu.edu

| No Comments | No TrackBacks
Part of the process of moving the students over to Exchange Labs is decommissioning the cc.wwu.edu domain for email. Students have been there for a loooong time, and once upon a time faculty/staff mail was there as well. We've since moved to wwu.edu for our fac/staff domain.

Next week we're turning off cc.wwu.edu for fac/staff. The students still over there will be moved slowly over to the hosted solution. The Fac/Staff users will be moved to Exchange, period.

This has created some heated feelings as there are professors who've published books and have "@cc.wwu.edu" printed in the books. I'm not sure how we're handling that, but... that's not my email system. Email addresses do tend to get stale after a while, and that's just a fact of the internet.

However, one of the guys in the office here was one of the very first people to get an email address at cc.wwu.edu way back in the dark and misty reaches of a more trusting internet. I don't know how long he had that address, but it very well could have been over 20 years. He's letting it go with a tear in his eye, but not a big one. He's one of the unlucky schmucks with his first name as his username, and it's in every. single. solitary. mail-list known to God and man. His @cc.wwu.edu account has been nothing but a spam trap for years now.

Spam stats

| No Comments | No TrackBacks
It has been a while. Some current spam stats from our border appliance:


Processed Spam Suspected Spam Attacks Zombie Blocked Allowed Viruses Suspected Virus Worms Unscannable Malware Passworded .zip file
Content
Last 30 days
9,489,662 4,328,334 (46%) 8,895 (<> 221,926 (2%) 4,163,253 (44%) 1 (<> 10,248 (<> 1,265 (<> 1,233 (<> 2,590 (<> 26,508 (<> 0 (0%) 13 (<> 0 (0%)

That is a lot of spam. The 'Zombie' class are connections rejected due to IP reputation. Between zombies and outright spam detections, 95% of all incoming mail has been rejected over the last 30 days. Shortly after the McColo shutdown that percentage dropped a lot. We're now back to where we were before the shutdown.

Other Blogs

My Other Stuff

About this Archive

This page is an archive of entries from June 2010 listed from newest to oldest.

May 2010 is the previous archive.

July 2010 is the next archive.

Find recent content on the main index or look in the archives to find all content.