Re: Performance problems testing with Spamassassin 3.1.0

From: Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>
To: Matthew Schumacher <matt(dot)s(at)aptalaska(dot)net>
Cc: josh(at)agliodbs(dot)com, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Performance problems testing with Spamassassin 3.1.0
Date: 2005-07-29 06:48:48
Message-ID: Pine.LNX.4.44.0507290835100.7470-100000@zigo.dhs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, 27 Jul 2005, Matthew Schumacher wrote:

> Then they do this to insert the token:
>
> INSERT INTO bayes_token (
> id,
> token,
> spam_count,
> ham_count,
> atime
> ) VALUES (
> ?,
> ?,
> ?,
> ?,
> ?
> ) ON DUPLICATE KEY
> UPDATE
> spam_count = GREATEST(spam_count + ?, 0),
> ham_count = GREATEST(ham_count + ?, 0),
> atime = GREATEST(atime, ?)
>
> Or update the token:
>
> UPDATE bayes_vars SET
> $token_count_update
> newest_token_age = GREATEST(newest_token_age, ?),
> oldest_token_age = LEAST(oldest_token_age, ?)
> WHERE id = ?
>
>
> I think the reason why the procedure was written for postgres was
> because of the greatest and least statements performing poorly.

How can they perform poorly when they are dead simple? Here are 2
functions that work for the above cases of greatest:

CREATE FUNCTION greatest_int (integer, integer)
RETURNS INTEGER
IMMUTABLE STRICT
AS 'SELECT CASE WHEN $1 < $2 THEN $2 ELSE $1 END;'
LANGUAGE SQL;

CREATE FUNCTION least_int (integer, integer)
RETURNS INTEGER
IMMUTABLE STRICT
AS 'SELECT CASE WHEN $1 < $2 THEN $1 ELSE $2 END;'
LANGUAGE SQL;

and these should be inlined by pg and very fast to execute.

I wrote a function that should do what the insert above does. The update
I've not looked at (I don't know what $token_count_update is) but the
update looks simple enough to just implement the same way in pg as in
mysql.

For the insert or replace case you can probably use this function:

CREATE FUNCTION insert_or_update_token (xid INTEGER,
xtoken BYTEA,
xspam_count INTEGER,
xham_count INTEGER,
xatime INTEGER)
RETURNS VOID AS
$$
BEGIN
LOOP
UPDATE bayes_token
SET spam_count = greatest_int (spam_count + xspam_count, 0),
ham_count = greatest_int (ham_count + xham_count, 0),
atime = greatest_int (atime, xatime)
WHERE id = xid
AND token = xtoken;

IF found THEN
RETURN;
END IF;

BEGIN
INSERT INTO bayes_token VALUES (xid,
xtoken,
xspam_count,
xham_count,
xatime);
RETURN;
EXCEPTION WHEN unique_violation THEN
-- do nothing
END;
END LOOP;
END;
$$
LANGUAGE plpgsql;

It's not really tested so I can't tell if it's faster then what you have.
What it does do is mimic the way you insert values in mysql. It only work
on pg 8.0 and later however since the exception handling was added in 8.0.

--
/Dennis Björklund

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Luke Lonergan 2005-07-29 07:01:07 Re: Performance problems testing with Spamassassin
Previous Message Gavin Sherry 2005-07-29 05:58:24 Re: Performance problems testing with Spamassassin 3.1.0