Recently in spam Category

Reputation services, a brief history

Or,

The Problem of Twitter, Hatemobs, and Denial of Service

The topic of shared blocklists is hot right now. For those who avoid the blue bird, a shared blocklist is much like a shared killfile from Ye Olde Usenet, or an RBL for spam. Subscribe to one, and get a curated feed of people you never want to hear from again. It's an idea that's been around for decades, applied to a new platform.

However, internet-scale has caught up with the technique.

Usenet

A Usenet killfile was a feature of NNTP clients where posts meeting a regex would not even get displayed. If you've ever wondered what the vengeful:

*Plonk!*

...was about? This is what it was referring to. It was a public way of saying:

I have put you into my killfile, and I am now telling you I have done so, you asshole.

This worked because in the Usenet days, the internet was a much smaller place. Once in a while you'd get waves of griefers swarming a newsgroup, but that was pretty rare. You legitimately could remove most content you didn't want to see from your notice. The *Plonk!* usage still exists today, and I'm seeing some twitter users use that to indicate a block is being deployed. I presume these are veterans of many a Usenet flame-war.

RBLs

The Realtime Blackhole Lists (RBL) were pioneered as an anti-spam technique. Mail administrators could subscribe to these, and all incoming mail-connections could be checked against it. If it was listed, the SMTP connection could be outright rejected. The assumption here was that spam comes from insecured or outright evil hosts, and that blocking them outright is better for everyone.

This was a true democratic solution in the spirit of free software: Anyone could run one.

That same sprit means that each RBL had a different criteria for listing. Some were zero tolerance, and even one Unsolicited Commercial Email was enough to get listed. Others, simply listed whole netblocks, so you could block all Cable ISPs, or entire countries.

Unlike killfiles, RBLs were designed to be a distributed system from the outset.

Like killfiles, RBLs are in effect a Book of Grudges. Subscribing to one, means subscribing to someone else's grudges. If you shared grudge-worthy viewpoints, this was incredibly labor saving. If you didn't, sometimes things got blocked that shouldn't have.

As a solution to the problem of spam, RBLs were not the silver bullet. That came with the advent of commercial providers deploying surveillance networks and offering IP reputation services as part of their paid service. The commercial providers were typically able to deploy far wider surveillance than the all-volunteer RBLs did, and as such saw a wider sample of overall email traffic. A wider sample means that they were less likely to ban a legitimate site for a single offense.

This is still the case today, though email-as-a-service providers like Google and Microsoft are now hosting the entire email stack themselves. Since Google handles a majority of all email on the planet, their surveillance is pretty good.

Compounding the problem for the volunteer-lead RBL efforts is IPv6. IPv4 was small enough you can legitimately tag the entire internet with spam/not-spam without undue resources. IPv6 is vastly larger and very hard to do comprehensively without resorting to netblock blocking. And even then, there are enough possible netblocks that scale is a real issue.

Twitter Blocklists

Which brings us to today, and twitter. Shared blocklists are in this tradition of killfiles and RBLs. However, there are a few structural barriers to this being the solution it was with Usenet:

  • No Netblocks. Which means each user has to be blocked individually, you can't block all of a network, or a country-of-origin
  • The number of accounts. Active-users is not the same as total-users. In 2013, the estimated registered-account number was around 810 million. Four years later, this is likely over a billion. It's rapidly approaching the size of the IPv4 address space.
  • Ease of setting up a new account. Changing your IP address Changing your username is very, very easy.

The lack of a summarization technique, the size of the problem space, and the ease of bypassing a block by changing your identifier mean that a shared-blocklist is a poor tool to fight off a determined hatemob. It's still good for casual griefing, where the parties aren't invested enough to break a blocklist.

The idea of the commercialized RBL, where a large company sells account-reputation services, is less possible. First of all, such an offering would likely be against the Twitter terms of service. Second, the target market is not mail-server-admins, but individual account-holders. A far harder market to monitize.

The true solution will have to come from Twitter itself. Either by liberalizing their ToS to allow those commercial services to develop, or developing their own reputation markets and content tools.

E-mail disclaimers

| 1 Comment
A nice article over at The Economist covers email disclaimers.

The long and short of it? No case anyone can remember ever hinged on them being there. And in Europe at least, such unilateral contracts are completely unenforceable. They're useless.

In my time in the computer industry, or heck anyone who does email professionally, I've seen some doozies. One memorable disclaimer had six paragraphs and lined out in detail the exact constraints placed on how that email shall be handled, the relationship established between me and the company, and expectations set by receiving such. All for an email that said, "I won't be there tonight. Maybe next month."

Clearly the person sending the email had never read their own disclaimer, since this was on a mailing list for non-work activity and that was clearly excluded by the disclaimer.

Which just goes to show, no one reads the damned things.
(Now coming to you from the East Coast of the US)

One of the things that struck me during the Cascadia IT Conference was the impact of IPv6 on IP reputation services. I've blogged about this in the past, but IP reputation is a very key spam-fighting technique. DNS RBLs have been around for over a decade now and remain the free option. In the paid anti-spam realm, the big vendors manage IP reputation databases to determine whether or not an incoming connection is worth of their time, and usually provide better granularity than the RBLs do. The same applies to blog comment-spam, as it happens.

The DNS-RBL functions very simply:

  1. A connection is made from the Internet.
  2. The mailer/blog-engine performs a lookup of the IP in the black-hole list.
  3. The RBL returns a value.
  4. The mailer/blog-engine acts on that value.
In an era where Comcast is passing out whole /64's to end-users, which in turn means end users can have more IP addresses than are available on the IPv4 Internet, this one-to-one style of lookup breaks. Obviously, a one-to-one port of the IPv4 RBL code to IPv6 will be not nearly as effective as it is with just IPv4.

The solution is fairly obvious, start blacklisting subnets, but the code-changes are non-trivial. Right now a stock RBL can be made with BIND and a standard Zone file filled with A records. Classful IPv4 subnets can be blacklisted with wildcard DNS entries. The same can be done for IPv6 zone-files, but the granularity is a lot better. Of course, RBL-clients need to be updated to handle RBL-lookups with v6 addresses.

Which is to say, that in the IPv6 future, subnet will matter more than discrete IP Address for many things. This is one of the areas that everything that relies on IP addresses for access decisions will have to start taking into consideration (as well as the people who encode the rules).

The risk of email interception

| 3 Comments
Anyone who does email knows that it is really easy to intercept in-flight. Unless TLS is in use the messages are transmitted in plain text, and the SMTP protocol is designed around the assumption that untrusted 3rd parties may handle the messages between source and destination (a holdover from the UUCP days as it happens). The appliance and cloud anti-spam industries are designed around this very capability.

But how much of a risk is illicit interception? Or worse, monitoring? Everyone knows you don't send passwords or credit-card information in email, but we also send password reset messages in email. Some web-sites still send your password when asked for a 'reminder', so clearly some reset-system designers consider email secure-enough. Or maybe it's just convenience trumping security again.

To figure out how much of a risk it is we need to know 2 things:
  1. How can email be intercepted?
  2. How likely these methods are to be used?
Interception can be accomplished two ways:
  1. Catch the messages in-flight by way of a sniffer.
  2. Catch the messages in the mail-spools of the mailers handling the message.
There is another vector that is even more damaging, though. Catching the message in the final mailbox. That isn't interception, it's something else, but it really impacts the security of email so I'll be including it. Under the fold.

Guaranteed delivery of email

Simply put, can't be done.

Even in the pre-spam Internet, SMTP was designed around the fact that email delivery is fundamentally best-effort. Back then it was more about handling network outages, but the fact remains that by design email is best-effort not guaranteed.

That doesn't stop people from demanding it, though.

More recently, with the Virginia Tech shootings a lot (I'd even go so far as to say most) of Higher Ed has taken a really close look at emergency alerting systems. WWU is not immune to this. We have an outside entity that handles this, and WWU upper administration has asked that such emails NOT end up in the Junk Mail folder (we also have an SMS alerting system to go along side this). This is harder than they think, which just makes my life less pleasant every time such a notice does end up in a Junk Mail folder.

With spam making up anywhere from 92-98% of all incoming email, email is fundamentally lossy these days. We LIKE it that way. The hard part is picking the good stuff out of the sea of bad stuff. And unfortunately, there is no one way to guarantee email WILL be seen by the recipient.

The most recent major junking event was because our outside mailing entity changed what servers they do their mailing from, which meant they weren't getting the benefit of the IP-whitelist. Either they didn't notify us, or the people who communicate with them didn't realize that was important and therefore didn't send the change notice to me. The fact that the message in question was a simple web-page copy with a lot of hyper-linked images just made it extra-spammy.

The WWU marketing department has been having their "WWU News" messages, emails with lots of links including mentions of WWU and links to WWU events, end up in Junk Mail about 80% of the time, even though the service they send through is ALSO whitelisted.

The one thing that makes my life all too interesting when attempting to guarantee email delivery? Outlook's junkmail filters. We can't do a thing with them, and Microsoft purposely makes them hard to predict. Nearly all of the junkings I end up troubleshooting end up being Outlook independently deciding it was crap and binning it. I can guarantee delivery right up to the point where Outlook analyzes new messages for spam-factors, but once it gets there all bets are off.

Unfortunately.

I can't guarantee email delivery. I never could, but it's harder now.
One of the trickier things we're dealing with these days are multi-function-devices. Or in specific, copiers that can email or save-to-server PDF/BMP/JPG/TIFF images of documents. You'd think this would be easy, but no.

On the one hand, we COULD just let these devices email blindly. Which would allow anonymous users to send butt-scans to the University President, a thing we generally try to avoid.

Or we could configure them with a user ID and password to send authenticated SMTP messages, and still butt-scan bomb the University President.

Ideally every one would have a specific login on these devices so there would be a full audit trail. This kind of thing can be done, but there are caveats. Swipe-card systems require all the devices to use the same back-end processor, and probably mean all the devices come from the same company. We can't use our universal login and passwords since that would require a full keyboard on these devices, and that just isn't going to happen any time soon.

The solution we've come up with isn't a good one, but it's still better than banning the enhanced functionality of these devices. Scanning can indeed reduce the amount of paper we push around. We're disabling the butt-scan vulnerability and forcing potential butt-scanners to drop the scans on a specific file-share where they'll have to forward it from a real email account.

I suspect "email my PDF" will be disabled until such time as we get a card-swipe system, or some other way to individually authenticate copier users.

It's the little things

| 1 Comment
My attention was drawn to something yesterday that I just hadn't registered before. Perhaps because I see it so often I didn't twig to it being special in just that place.

Here are the Received: headers of a bugzilla message I got yesterday. It's just a sample. I've bolded the header names for readability:
Received: from ExchEdge2.cms.wwu.edu (140.160.248.208) by ExchHubCA1.univ.dir.wwu.edu (140.160.248.102) with Microsoft SMTP Server (TLS) id 8.1.393.1; Tue, 15 Sep 2009 13:58:10 -0700
Received: from mail97-va3-R.bigfish.com (216.32.180.112) by
ExchEdge2.cms.wwu.edu (140.160.248.208) with Microsoft SMTP Server (TLS) id 8.1.393.1; Tue, 15 Sep 2009 13:58:09 -0700
Received: from mail97-va3 (localhost.localdomain [127.0.0.1]) by mail97-va3-R.bigfish.com (Postfix) with ESMTP id 6EFC9AA0138 for me; Tue, 15 Sep 2009 20:58:09 +0000 (UTC)
Received: by mail97-va3 (MessageSwitch) id 12530482889694_15241; Tue, 15 Sep 2009 20:58:08 +0000 (UCT)
Received: from monroe.provo.novell.com (monroe.provo.novell.com [137.65.250.171]) by mail97-va3.bigfish.com (Postfix) with ESMTP id 5F7101A58056 for me; Tue, 15 Sep 2009 20:58:07 +0000 (UTC)
Received: from soval.provo.novell.com ([137.65.250.5]) by
monroe.provo.novell.com with ESMTP; Tue, 15 Sep 2009 14:57:58 -0600
Received: from bugzilla.novell.com (localhost [127.0.0.1]) by soval.provo.novell.com (Postfix) with ESMTP id A56EECC7CE for me; Tue, 15 Sep 2009 14:57:58 -0600 (MDT)
For those who haven't read these kinds of headers before, read from the bottom up. The mail flow is:
  1. Originating server was Bugzilla.novell.com, which mailed to...
  2. soval.provo.novell.com running Postfix, who forwarded it on to Novell's outbound mailer...
  3. monroe.provo.novell.com, who attempted to send to us and sent to the server listed in our MX record...
  4. mail97-va3.bigfish.com running Postfix, who forwarded it on to another mailer on the same machine...
  5. mail97-ca3-r running something called MessageSwitch, who sent it on to the internal server we set up...
  6. exchedge2.cms.wwu.edu running Exchange 2007, who send it on to the Client Access Server...
  7. exchhubca1.univ.dir.wwu.edu for 'terminal delivery'. Actually it went on to one of the Mailbox servers, but that doesn't leave a record in the SMTP headers.
Why is this unusual? Because steps 4 and 5 are at Microsoft's Hosted ForeFront mail security service. The perceptive will notice that step 4 indicates that the server is running Postfix.

Postfix. On a Microsoft server. Hur hur hur.

Keep in mind that Microsoft purchased the ForeFront product line lock stock and barrel. If that company had been using non-MS products as part of their primary message flow, then Microsoft probably kept that up. Next versions just might move to more explicitly MS-branded servers. Or not, you never know. Microsoft has been making placating notes towards Open Source lately. They may keep it.

Exchange Transport Rules, update

| 2 Comments
Remember this from a month ago? As threatened in that post I did go ahead and call Microsoft. To my great pleasure, they were able to reproduce this problem on their side. I've been getting periodic updates from them as they work through the problem. I went through a few cycles of this during the month:

MS Tech: Ahah! We have found the correct regex recipe. This is what it is.
Me: Let's try it out shall we?
MS Tech: Absolutely! Do you mind if we open up an Easy Assist session?
Me: Sure. [does so. Opens sends a few messages through, finds an edge case that the supplied regex doesn't handle]. Looks like we're not there yet in this edge case.
MS Tech: Indeed. Let me try some more things out in the lab and get back to you.

They've finally come up with a set of rules to match this text definition: "Match any X-SpamScore header with a signed integer value between 15 and 30".

Reading the KB article on this you'd think these ORed patterns would match:
^1(5|6|7|8|9)$
^2\d$
^30$
But you'd be wrong. The rule that actually works is:
(^1(5$|6$|7$|8$|9$))|(^2(\d$))|(^3(0$))
Except if ^-
Yes, that 'except if' is actually needed, even though the first rule should never match a negative value. You really need to have the $ inside the parens for the first statement, or it doesn't match right; this won't work: ^1(5|6|7|8|9)$. The same goes for the second statement with the \d$ constructor. The last statement doesn't need the 0$ in parens, but is there to match the pattern of the previous two statements of having the $ in the paren.

Riiiiiight.

In the end, regexes in Exchange 2007 Transport Rules are still broken, but they can be made to work if you pound on them enough. We will not be using them because they are broken, and when Microsoft gets around to fixing them the hack-ass recipes we cook up will probably break at that time as well. A simple value list is what we're using right now, and it works well for 16-30. It doesn't scale as well for 31+, but there does seem to be a ceiling on what X-SpamScore can be set to.

Exchange transport-rules

| 1 Comment
Exchange 2007 supports a limited set of Regular Expressions in its transport-rules. The Microsoft technet page describing them is here. Unfortunately, I believe I've stumbled into a bug. We've recently migrated our AntiSpam to ForeFront. And part what ForeFront does is header markup. There is a Spamminess number in the header:
X-SpamScore: 66
That ranges from deeply negative to over a hundred. With this we can structure transport-rules to handle spammy email. In theory, the following trio of regexes should catch anything with a score of 15 or higher:
^1(5|6|7|8|9)$
^(2|3|4|5|6|7|8|9)\d$
^\d\d\d$
Those of you that speak Unix regex are quirking an eyebrow at that, I know. Like I said, Microsoft didn't do the full Unix regex treatment. The "\d" flag, "matches any single numeric digit." The parenthetical portion, "Parentheses act as grouping delimiters," and, "The pipe ( | ) character performs an OR function."

Unfortunately, for reasons that do not match the documentation the above trio of regexes is returning true on this:
X-SpamScore: 5
It's the second recipe that's doing it, and it looks to be the combination of paren and \d that's the problem. For instance, the following rule:
^\d(6|7)$
returns true for any single numeric value, but returns false for "56". Where this rule:
^5(6|7)
only returns true for 56 and 57. To me this says there is some kind of interaction going on between the \d and the () constructors that's causing it to change behavior. I'll be calling Microsoft to see if this is working as designed and just not documented correctly, or a true bug.

Email reputation

One of the hot new things in anti-spam technology is something that's rather old. Yes, the Realtime Blackhole List is back. Only these RBL's aren't the old school DNS servers of yesteryear, these RBLs are maintained by the big anti-spam vendors and are completely proprietary. The new name is now, "IP Reputation," and that's showing up on the marketing glossies.

The idea is that you deploy a network of sensors (say, every anti-spam appliance you ship, or software-package installed) that relay spam/ham information back to home base. Home base then builds a profile of the behaviors of the incoming IP connections. Once certain completely proprietary threshold are crossed, the anti-spam vendor then publishes that particular IP addresses reputation to their service. The installed base then queries the reputation service on every incoming TCP connection to see how to handle that connection.

The response varies from vendor to vendor, but include:
  • Outright blocking. Do not accept traffic from this IP address. The connection is terminated before any SMTP commands can be issued. Do not pass EHLO. Do not collect 220-ESMTP.
  • Deferr. Issue a 421 error message. Smart mailers will attempt redelivery later. Bots are generally too stupid to try this and just pass on to the next address on their list.
  • Throttle. Get very slow in accepting mail. Take a long time to issue 250-Ready statuses after SMTP commands.
The nice thing about IP reputation is that it is fast and cheap. Instead of having to lexically scan every incoming email for spamminess, you can just look at the source's reputation and block a very large percentage of messages. When we turned this on for our spam product a while back, the reputation filter blocked between 90% to 95% of all messages ultimately blocked as spam. Clean email is the single most expensive mail to pass since it has to go through every single stage of the spam/ham test pipeline, and blocking things earlier in the pipeline is a good way to shed load.

Not all optimizations are without side effects, and this one wasn't. The former student email server, titan, got itself 'greylisted' due to spam quantities. Around 50% of the message traffic into Exchange from this system was ultimately blocked as Spam according to the old anti-spam appliances we had (we'd routed its mail through the 'outbound' queue on those appliances so it wouldn't be subject to reputation tests, but would still scan email). As part of the migration of student email to OutlookLive.Edu, we set up forwards from the old cc.wwu.edu addresses to the new addresses. The spam-checkers on titan were of poor enough quality that enough spam got through to cause OutlookLive to start grey-listing Titan, causing mail to really back up on it.

That's not the only thing. Certain mailers managed by departments other that ITS here at WWU have managed to get themselves greylisted or outright blacklisted on these proprietary reputation lists. The one common denominator we've found is that certain specific UNIXy mailers do not apply their anti-spam processes to mail that is subjected to a .forward. At least, not without specific config telling it to scan that traffic. So if a person on one of these mailers has a .forward sending all mail into Exchange, the full spam-filled feed heads to Exchange and the reptuation of that mailer gets dinged.

Which is a long way of saying that, ahem:

In this era of IP reputation, outbound spam filtering is now just as required as inbound.

Really. Go do it. It'll help prevent blacklistings, and that sucks for anyone subjected to it.