Re: Performance problems testing with Spamassassin 3.1.0

From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
Cc: Matthew Schumacher <matt(dot)s(at)aptalaska(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance problems testing with Spamassassin 3.1.0
Date: 2005-07-31 18:19:11
Message-ID: 42ED161F.1020408@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Jim C. Nasby wrote:
> On Sun, Jul 31, 2005 at 08:51:06AM -0800, Matthew Schumacher wrote:
>
>>Ok, here is the current plan.
>>
>>Change the spamassassin API to pass a hash of tokens into the storage
>>module, pass the tokens to the proc as an array, start a transaction,
>>load the tokens into a temp table using copy, select the tokens distinct
>>into the token table for new tokens, update the token table for known
>>tokens, then commit.
>
>
> You might consider:
> UPDATE tokens
> FROM temp_table (this updates existing records)
>
> INSERT INTO tokens
> SELECT ...
> FROM temp_table
> WHERE NOT IN (SELECT ... FROM tokens)
>
> This way you don't do an update to newly inserted tokens, which helps
> keep vacuuming needs in check.

The subselect might be quite a big set, so avoiding a full table scan
and materialization by

DELETE temp_table
WHERE key IN (select key FROM tokens JOIN temp_table);
INSERT INTO TOKENS SELECT * FROM temp_table;

or

INSERT INTO TOKENS
SELECT temp_table.* FROM temp_table LEFT JOIN tokens USING (key)
WHERE tokens.key IS NULL

might be an additional win, assuming that only a small fraction of
tokens is inserted and updated.

Regards,
Andreas

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message William Yu 2005-07-31 21:11:58 Re: Performance problems on 4/8way Opteron (dualcore)
Previous Message Jim C. Nasby 2005-07-31 17:10:12 Re: Performance problems testing with Spamassassin 3.1.0