Skip site navigation (1) Skip section navigation (2)

Re: Performance problems testing with Spamassassin 3.1.0

From: Matthew Schumacher <matt(dot)s(at)aptalaska(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance problems testing with Spamassassin 3.1.0
Date: 2005-07-29 21:48:00
Message-ID: 42EAA410.9010700@aptalaska.net (view raw or flat)
Thread:
Lists: pgsql-performance
Ok, here is where I'm at, I reduced the proc down to this:

CREATE FUNCTION update_token (_id INTEGER,
                              _token BYTEA,
                              _spam_count INTEGER,
                              _ham_count INTEGER,
                              _atime INTEGER)
RETURNS VOID AS
$$
BEGIN
  LOOP
    UPDATE bayes_token
      SET spam_count = spam_count + _spam_count,
          ham_count  = ham_count + _ham_count,
          atime      = _atime
      WHERE id = _id
      AND token = _token;

    IF found THEN
      RETURN;
    END IF;

    INSERT INTO bayes_token VALUES (_id, _token, _spam_count,
_ham_count, _atime);
    IF FOUND THEN
      UPDATE bayes_vars SET token_count = token_count + 1 WHERE id = _id;
      IF NOT FOUND THEN
        RAISE EXCEPTION 'unable to update token_count in bayes_vars';
        return FALSE;
      END IF;

       RETURN;
    END IF;

    RETURN;
  END LOOP;
END;
$$
LANGUAGE plpgsql;

All it's doing is trying the update before the insert to get around the
problem of not knowing which is needed.  With only 2-3 of the queries
implemented I'm already back to running about the same speed as the
original SA proc that is going to ship with SA 3.1.0.

All of the queries are using indexes so at this point I'm pretty
convinced that the biggest problem is the sheer number of queries
required to run this proc 200 times for each email (once for each token).

I don't see anything that could be done to make this much faster on the
postgres end, it's looking like the solution is going to involve cutting
down the number of queries some how.

One thing that is still very puzzling to me is why this runs so much
slower when I put the data.sql in a transaction.  Obviously transactions
are acting different when you call a proc a zillion times vs an insert
query.

Anyway, if anyone else has any ideas I'm all ears, but at this point
it's looking like raw query speed is needed for this app and while I
don't care for mysql as a database, it does have the speed going for it.

schu

In response to

Responses

pgsql-performance by date

Next:From: DarioDate: 2005-07-29 23:18:56
Subject: Re: Left joining against two empty tables makes a query
Previous:From: Tom LaneDate: 2005-07-29 21:01:57
Subject: Re: Performance problems testing with Spamassassin 3.1.0

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group