Re: Spam filtering on the mailing lists

From: "Greg Sabino Mullane" <greg(at)turnstep(dot)com>
To: pgsql-www(at)postgresql(dot)org
Subject: Re: Spam filtering on the mailing lists
Date: 2008-07-17 15:54:41
Message-ID: d7865a888f6738c980a725cbe912f8c3@biglumber.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www


-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

> Its sad how this is such an ongoing problem, but this is the first that I hear
> that ppl are having problems ... looking at the message headers for a random
> few, I notice that they are scoring >4, but just below 5:
...
> I can change the quarantining to be >4 if ppl want, which should greatly reduce
> the # of messages going through ...

I think that would be a good start, but there are definitely some other problems.
First, the example you gave:

> X-Spam-Status: No, hits=4.855 tagged_above=0 required=5 tests=AWL=-1.994,
> DCC_CHECK=1.37, DIGEST_MULTIPLE=0.001, HTML_MESSAGE=0.001,
> MIME_HTML_ONLY=1.672, RAZOR2_CHECK=0.5, RCVD_IN_BL_SPAMCOP_NET=2.188,
> RCVD_IN_SORBS_WEB=1.117

A score of 0.001 for HTML_MESSAGE? Might as well not have the check at all. Same
with things like DIGEST_MULTIPLE. I think we need more checks, and much higher
scores for many of them.

I grabbed a few random messages from the bugs list last night. Most interesting
was that some had no X-Spam-Status headers at all - does this mean they slipped
through the spam filtering entirely? Here's one of them:

===
Return-Path: <owner-pgsql-bugs-postgresql(dot)org(at)postgresql(dot)org>
Delivered-To: pgsql-bugs-postgresql(dot)org(at)postgresql(dot)org
Received: from localhost (unknown [200.46.204.183])
by postgresql.org (Postfix) with ESMTP id C3148650275
for <pgsql-bugs-postgresql(dot)org(at)postgresql(dot)org>; Wed, 16 Jul 2008 15:40:45 -0300 (ADT)
Received: from postgresql.org ([200.46.204.86])
by localhost (mx1.hub.org [200.46.204.183]) (amavisd-maia, port 10024)
with ESMTP id 48600-04-3 for <pgsql-bugs-postgresql(dot)org(at)postgresql(dot)org>;
Wed, 16 Jul 2008 15:40:43 -0300 (ADT)
X-Greylist: from auto-whitelisted by SQLgrey-1.7.6
Received: from wwwmaster.postgresql.org (wwwmaster.postgresql.org [217.196.146.204])
by postgresql.org (Postfix) with ESMTP id AB1D565026D
for <pgsql-bugs(at)postgresql(dot)org>; Wed, 16 Jul 2008 15:40:44 -0300 (ADT)
Received: from wwwmaster.postgresql.org (wwwmaster.postgresql.org [217.196.146.204])
by wwwmaster.postgresql.org (8.13.8/8.13.8) with ESMTP id m6GIehuA007983
for <pgsql-bugs(at)postgresql(dot)org>; Wed, 16 Jul 2008 18:40:43 GMT
(envelope-from www(at)wwwmaster(dot)postgresql(dot)org)
Received: (from www(at)localhost)
by wwwmaster.postgresql.org (8.13.8/8.13.8/Submit) id m6GIehIP007982;
Wed, 16 Jul 2008 18:40:43 GMT
(envelope-from www)
Date: Wed, 16 Jul 2008 18:40:43 GMT
Message-Id: <200807161840(dot)m6GIehIP007982(at)wwwmaster(dot)postgresql(dot)org>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #4310: PkMERMInZQ
From: "make money on line" <makemoney(at)money2009(dot)com>
Content-Type: text/plain; charset=utf-8
X-Virus-Scanned: Maia Mailguard 1.0.1

The following bug has been logged online:

Bug reference: 4310
Logged by: make money on line
Email address: makemoney(at)money2009(dot)com
PostgreSQL version: IUrjkiPgQkQXNgo
Operating system: aJzBuaSGetA
Description: PkMERMInZQ
Details:

<a href=" http://www.divinecaroline.com/public/user/profile?user_id=83997
">work at home jobs 101waystoincome.com</a>

====

Did it get whitelisted because it came from our form? I still think we
should scan it - the "make money on line" is a dead giveaway, and
when I ran a local spamassassin on it, I even found:

2.0 URIBL_BLACK Contains an URL listed in the URIBL blacklist
[URIs: 101waystoincome.com]

Here's another one from last night that did have a spam header. I apologize
for how long this post is getting, but I'm trying to provide some hard data:
===

Return-Path: <owner-pgsql-hackers-postgresql(dot)org(at)postgresql(dot)org>
Delivered-To: pgsql-hackers-postgresql(dot)org(at)postgresql(dot)org
Received: from localhost (unknown [200.46.204.183])
by postgresql.org (Postfix) with ESMTP id AFB3A64FD01
for <pgsql-hackers-postgresql(dot)org(at)postgresql(dot)org>; Wed, 16 Jul 2008 23:15:20 -0300 (ADT)
Received: from postgresql.org ([200.46.204.86])
by localhost (mx1.hub.org [200.46.204.183]) (amavisd-maia, port 10024)
with ESMTP id 35883-07 for <pgsql-hackers-postgresql(dot)org(at)postgresql(dot)org>;
Wed, 16 Jul 2008 23:15:11 -0300 (ADT)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from smtp1web.tin.it (smtp1web.tin.it [212.216.176.195])
by postgresql.org (Postfix) with ESMTP id 8ECBB64FCE4
for <pgsql-hackers(at)postgresql(dot)org>; Wed, 16 Jul 2008 23:15:17 -0300 (ADT)
Received: from pswm6.cp.tin.it (192.168.70.26) by smtp1web.tin.it (8.0.016.5)
id 48623AD8015C5727; Thu, 17 Jul 2008 03:59:43 +0200
Message-ID: <11b2ebe81d4(dot)clementetajana(at)virgilio(dot)it>
Date: Thu, 17 Jul 2008 02:59:41 +0100 (GMT+01:00)
From: "Tajana for(Mrs. Lucy Berg)" <clementetajana(at)virgilio(dot)it>
Reply-To: cpinans(at)users(dot)sourceforge(dot)net
Subject: REMINDER NOTIFICATION
Mime-Version: 1.0
Content-Type: text/plain;charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Originating-IP: 62.163.243.54
To: undisclosed-recipients:;
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits=1.806 tagged_above=0 required=5
tests=SUBJ_ALL_CAPS=1.806
X-Spam-Level: *

REMINDER NOTIFICATION

This email is to notify you that your Email
Address attached to a
Ticket Number(140408) has won an Award Sum of
($500,000.00)(Five
Hundred Thousand Dollars)In an Email Sweepstakes
program held in
The Netherlands these year 2008.Please contact the
claim officer
through the below given contact information.

MR.HANSON
CHRIS.
TEL. +31-643-502-787.
FAX: +31-847-290-539.
E-mail:cpinans(at)aol(dot)
nl

WINNING INFORMATIONS
Ref Number:Nl50286
lucky Numbers:
07,12,24,36,45
Batch Number:EU-175508
Ticket Number:360208

Please
forward the above stated winning information to your Claim
Agent and do
include the following,

Your Name:
Telephone Number:

Congratulations!!!

Yours Sincerely,
Mrs. Lucy Berg.
Public Relation
Officer.

===

The only spam trigger found by postgresql.org was:

X-Spam-Status: No, hits=1.806 tagged_above=0 required=5
tests=SUBJ_ALL_CAPS=1.806

There are numerous triggers in the body of the email that should
have boosted the score up. Personally, I'd also like to see
SUBJ_ALL_CAPS raised to 3 or 4.

So, to reiterate, I'd like to request the following:

1) Spam filtering is run on all messages
2) The default to reject is lowered to at least 4
3) The values get raised significantly for some tests
4) More SA tests get added (are we at least cronning sa-update?)
5) If 3 and 4 are too much trouble to maintain, outsource the
filtering to someone who does have the time, or who specializes
in it (economies of scale)

I did #5 myself years ago, after getting tired of updating SA rules,
messing with DNS lookups, blacklists, etc. and now just let
maillaunder.com handle it all.

- --
Greg Sabino Mullane greg(at)turnstep(dot)com
PGP Key: 0x14964AC8 200807171149
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkh/atsACgkQvJuQZxSWSsjKKwCg4Pc0SNrYjfUZuJRQZjU6jDHR
oc0An0vTdKzfIJ3+CxQXpw7TZyWu0Tb6
=a3/E
-----END PGP SIGNATURE-----

In response to

Responses

Browse pgsql-www by date

  From Date Subject
Next Message Marc G. Fournier 2008-07-17 18:11:18 Re: Spam filtering on the mailing lists
Previous Message Alvaro Herrera 2008-07-17 15:54:28 Re: Spam filtering on the mailing lists