Re: Postgres vs. DSpam

From: Andrew McMillan <andrew(at)catalyst(dot)net(dot)nz>
To: Evilio del Rio <edelrio(at)cmima(dot)csic(dot)es>
Cc: dspam-users(at)networkdweebs(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres vs. DSpam
Date: 2004-11-26 01:37:12
Message-ID: 1101433032.16058.57.camel@lamb.mcmillan.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, 2004-11-24 at 14:14 +0100, Evilio del Rio wrote:
> Hi,
>
> I have installed the dspam filter
> (http://www.nuclearelephant.com/projects/dspam) on our mail server
> (RedHat 7.3 Linux with sendmail 8.13 and procmail). I have ~300 users
> with a quite low traffic of 4000 messages/day. So it's a quite common
> platform/environment, nothing spectacular.

I am using DSpam with PostgreSQL here. I have a daily job that cleans
the DSpam database up, as follows:

DELETE FROM dspam_token_data
WHERE (innocent_hits*2) + spam_hits < 5
AND CURRENT_DATE - last_hit > 60;

DELETE FROM dspam_token_data
WHERE innocent_hits = 1
AND CURRENT_DATE - last_hit > 30;

DELETE FROM dspam_token_data
WHERE CURRENT_DATE - last_hit > 180;

DELETE FROM dspam_signature_data
WHERE CURRENT_DATE - created_on > 14;

VACUUM dspam_token_data;

VACUUM dspam_signature_data;

I also occasionally do a "VACUUM FULL ANALYZE;" on the database as well.

In all honesty though, I think that MySQL is better suited to DSpam than
PostgreSQL is.

> Please, could anyone explain me this difference?
> Is Postgres that bad?
> Is MySQL that good?
> Am I the only one to have observed this behavior?

I believe that what DSpam does that is not well-catered for in the way
PostgreSQL operates, is that it does very frequent updates to rows in
(eventually) quite large tables. In PostgreSQL the UPDATE will result
internally in a new record being written, with the old record being
marked as deleted. That old record won't be re-used until after a
VACUUM has run, and this means that the on-disk tables will have a lot
of dead rows in them quite quickly.

The reason that PostgreSQL operates this way, is a direct result of the
way transactional support is implemented, and it may well change in a
version or two. It's got better over the last few versions, with things
like pg_autovacuum, but that approach still doesn't suit some types of
database updating.

Cheers,
Andrew.
-------------------------------------------------------------------------
Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington
WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267
These PRESERVES should be FORCE-FED to PENTAGON OFFICIALS!!
-------------------------------------------------------------------------

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Neil Conway 2004-11-26 03:25:25 Re: Postgres vs. DSpam
Previous Message Gary Doades 2004-11-25 20:35:25 Postgres backend using huge amounts of ram