Re: [OT] Tom's/Marc's spam filters?

From: Joe Conway <mail(at)joeconway(dot)com>
To: Michael Chaney <mdchaney(at)michaelchaney(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: [OT] Tom's/Marc's spam filters?
Date: 2004-04-21 18:00:06
Message-ID: 4086B6A6.5000306@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Michael Chaney wrote:
> Make sure you have the latest SA and make sure that Bayesian filtering
> is turned on and working, and make sure to train the filter. Reply to
> me offlist if you need a group of 5000 or so spams to help train it.

I've got the latest SA and I'm using Bayesian filtering, autolearn,
razor2, dcc, and pyzor. I'm also using relays.ordb.org,
sbl.spamhaus.org, bl.spamcop.net, and blackholes.five-ten-sg.com
(although I just added that last one yesterday). I've verified that
autolearn is working. I have my threshold set downward, from the default
of 5.0, to 2.5.

I get a comparible amount of spam (~600 to 1000 per day) and my setup
*was* about 98% effective until a month or so ago. These days it is more
like 80%. I've noticed many of the spam getting through appears
specifically targeted at getting by SA -- no HTML, a paragraph of
nonsense (or sometimes out of some public domain book), and a one liner
trying to sell me a mortgage or something.

The one thing I had *not* been doing, but started to do as of last
night, is to use the false-negatives to explicitly train the Bayesian
filter. It was easy enough to set up. I created an hourly cron job as
follows:

/usr/bin/sa-learn --mbox --spam /path/to/false-neg.mbox

Now I just drop all false negatives into that mailbox, and clean them
out periodically. Hopefully that will make a significant improvement.

Joe

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Chaney 2004-04-21 18:01:32 Re: [OT] Tom's/Marc's spam filters?
Previous Message Philipp Buehler 2004-04-21 17:52:15 7.3.4 on Linux: UPDATE .. foo=foo+1 degrades massivly over time