[BBLISA] State of spam filtering?

Rich Braun richb at pioneer.ci.net
Wed May 22 01:57:11 EDT 2013


Tom Metro inquired:
> Or have you found that the cost increase for business-class service
> exceeds the cost of these other services?

I have a 50Mbps / 5Mbps residential cable service priced at $39.95/mo (from a
competitor of Comcast).  The equivalent Comcast business service is
$199.95/mo.

My supplemental services are:

  MailJet:  free (for up to something like 200 relays/day)
  DynDNS:   $18/year
  SendLabs MailHop: $20/year (discontinued once I got MailJet)
  CrashPlan: about $35/year
  DNS reg & hosting:  about $10-20/year

So the supplemental services add up to something like $5 a month, vs. the
$160/mo extra for biz-class service.  I get to pick and choose best-of-breed
services -- unsubscribing whenever I'm dissatisfied -- instead of suffering
through the bundled service of an incumbent provider.

> (Business class also buys you a lack of port blocking for other
> services, and usually no data caps.)

I get about 300 GB/mo data cap which is way more than I would need even if I
were running a real business.  But all I really have is a calling-card website
and "personal brand" email, no direct revenue going through this.

>> 1) Inbound forwarders
>> For item #1, forwarders, much inbound spam gets caught by the relatively
>> lightweight rules of the DNS provider I use, EasyDNS, which also includes
>> port-25 remapping as a free service.
>
> What exactly does this service do?
> Presumably it accepts mail on standard port 25 and forwards it to your
> server on some non-standard port, but is it acting as a typical
> store-and-froward MX

Correct.  It's just a store-and-forward, not a proxy.

> What anti-spam rules does it provide?

They don't say, but I presume it looks at most of the reliable RBLs.

> you decrease their effectiveness,
> because you no longer have a direct connection with the client, and thus
> don't know their IP address and can't use more sophisticated techniques,
> like passive OS fingerprinting.

Years ago I spent weeks developing a custom set of exim rules to implement
then-latest techniques; what I liked about exim was the ability to craft
filters, conditionals and delays at every stage of the SMTP protocol (with
finer detail than is possible with postfix).  But I long since switched to
postfix.

> Does it do anything to compensate for the loss of metadata, like passing
> on the client's IP address? ...

Your posting got me to wonder:  I switched ISPs well over a year ago, and
never bothered checking to see if port 25 is blocked.  Just tried it this
evening, and ohmigosh, I have an unblocked consumer-grade connection.  So now
I'm experimenting with direct connection.  (Alas, I have to rethink my HA
config:  at the moment I have haproxy 1.4 and postfix 2.9 installed--they
aren't that old, but in order to pass the origin address through to my
Spamassassin rules, I'll have to upgrade both.)

> After many years of requests, Covad eventually complied with my request
> and provided a customized PTR record.

That's why I smart-host outbound through companies which deal with that
problem.  Deliverability is still somewhat challenging but I think I'm
reaching well over 99% of correspondents and no longer feel like I have to
test it every few months.

> So in your experience they are sifting out all the IPs and looking for
> any that are on known blocks of IPs that are handed out dynamically to
> consumer net connections?

It sure seems that way.  Deliverability goes way down if any Received header
contains the IP a consumer-grade ISP like the one I have (or from a cloud
service: the same problem exists if you try to set up email using VMs at
Rackspace, AWS or related services).

> How are they not getting tripped up by users who mail through their
> ISP's relay?

They are.  You can't just set up smart-hosting of your personal domain through
Comcast and expect deliverability above something like 95%; at least 5% of
your outbound will be silently discarded.

> What's stopping you from using your ISP provided relay?

See above.  (But there's also this:  privacy rights are non-existent, and I've
been the "subpoena response department" at a couple of network providers in my
day.  Users are never notified of subpoenas anymore, your ISP will turn
everything over without your ever knowing.  I "don't have anything to hide"
but I still don't like the fact that businesses and government entities alike
can grab logs and even contents of my correspondence on a whim.  Hence most of
my correspondence goes overseas before reaching its destination, and my ISP's
mitts are kept off it unless they record and crack SSL data streams.)

> On the copy of the message you sent me directly, the first (oldest)
> Received header was added by Google, so it seems Mailjet is stripping
> all Received headers, including its own (which I guess would be the only
> received header if that was the first hop from your mail client).

You got it.  If you find another mail relay provider that does that, I'm all
ears, I'd like to have a second for redundancy.

> I haven't looked at Spamassassin since the early 2000s and my impression
> was that it was pretty ugly. Have they cleaned up the code and config
> since then?

Hah.  Like a lot of older open-source projects, it's old and crufty.  But it's
a niche that not too many new open-source developers want to tackle.

> Is Spamassassin invoked to analyze the metadata of a message as it is
> being received, ...or is it still being used in a model
> where the message gets spooled to disk [?]

Well, now that I'm able to receive messages directly on port 25, I can
configure it earlier in the SMTP protocol.  But in recent years I've just been
using Spamassassin as a post-processing filter to catch things that the
EasyDNS service wasn't catching.

> Most of the other stuff, like consulting RBLs or SPF, seems like it
> should be doable via lighter weight Sender Policy Framework services in
> Postfix.

I suppose Spamassassin could be considered "heavy weight" but it's capable of
handling tens of thousands of messages per hour, per server, and it's trivial
to horizontally scale inbound mail relays.  Electric power where I live (city
limits of San Francisco) is an insane $.35/kWh, so I only run two physical
multicore servers these days, but that means I can run at least two VMs of
most any application I want so there's enough headroom that I could run a
good-sized ISP out of the house if I wanted.  (No, I don't. ;-)

> If you change your focus from looking for spam-like behavior to strictly
> identifying the sender, as Google does,

Well, wouldn't I love it if email could be whitelisted for everyone that I
know (indeed, that's one of the rules I have in Spamassassin:  a whitelist
lookup against my personal email database).  But there's no reliable online
registry of good-guys email accounts/domains, and most of us want to be able
to hand out our email address to new acquaintances without requiring them to
provide white-listing information about themselves.  It's a tough problem, and
there's no 100% solution.  So I accept the need to delete a handful of spams
per week, knowing that thousands of them were automatically removed before
reaching me.

Getting back to the original question, has the state of the art advanced?  Not
that I've seen.  The open-source tools seem to be about what they were 10
years ago, and the commercial alternatives have all moved to proprietary cloud
services.  The good news about this predicament is that spammers no longer
have much incentive to target the shrinking handful of people who rely on the
open-source tools, so I'm no longer losing the once-escalating battle.

-rich










More information about the bblisa mailing list