Quick Links

Re: Performance problems testing with Spamassassin 3.1.0

From:	Matthew Schumacher <matt(dot)s(at)aptalaska(dot)net>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Performance problems testing with Spamassassin 3.1.0
Date:	2005-07-29 21:48:00
Message-ID:	42EAA410.9010700@aptalaska.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Ok, here is where I'm at, I reduced the proc down to this:

CREATE FUNCTION update_token (_id INTEGER,
_token BYTEA,
_spam_count INTEGER,
_ham_count INTEGER,
_atime INTEGER)
RETURNS VOID AS
$$
BEGIN
LOOP
UPDATE bayes_token
SET spam_count = spam_count + _spam_count,
ham_count = ham_count + _ham_count,
atime = _atime
WHERE id = _id
AND token = _token;

IF found THEN
RETURN;
END IF;

INSERT INTO bayes_token VALUES (_id, _token, _spam_count,
_ham_count, _atime);
IF FOUND THEN
UPDATE bayes_vars SET token_count = token_count + 1 WHERE id = _id;
IF NOT FOUND THEN
RAISE EXCEPTION 'unable to update token_count in bayes_vars';
return FALSE;
END IF;

RETURN;
END IF;

RETURN;
END LOOP;
END;
$$
LANGUAGE plpgsql;

All it's doing is trying the update before the insert to get around the
problem of not knowing which is needed. With only 2-3 of the queries
implemented I'm already back to running about the same speed as the
original SA proc that is going to ship with SA 3.1.0.

All of the queries are using indexes so at this point I'm pretty
convinced that the biggest problem is the sheer number of queries
required to run this proc 200 times for each email (once for each token).

I don't see anything that could be done to make this much faster on the
postgres end, it's looking like the solution is going to involve cutting
down the number of queries some how.

One thing that is still very puzzling to me is why this runs so much
slower when I put the data.sql in a transaction. Obviously transactions
are acting different when you call a proc a zillion times vs an insert
query.

Anyway, if anyone else has any ideas I'm all ears, but at this point
it's looking like raw query speed is needed for this app and while I
don't care for mysql as a database, it does have the speed going for it.

schu

In response to

Re: Performance problems testing with Spamassassin 3.1.0 at 2005-07-29 21:01:57 from Tom Lane

Responses

Re: Performance problems testing with Spamassassin 3.1.0 at 2005-07-30 18:28:53 from Tom Lane
Re: Performance problems testing with Spamassassin 3.1.0 at 2005-07-31 05:27:06 from John Arbash Meinel

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Dario	2005-07-29 23:18:56	Re: Left joining against two empty tables makes a query
Previous Message	Tom Lane	2005-07-29 21:01:57	Re: Performance problems testing with Spamassassin 3.1.0