Skip site navigation (1) Skip section navigation (2)

Re: Performance problems testing with Spamassassin 3.1.0

From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
Cc: Matthew Schumacher <matt(dot)s(at)aptalaska(dot)net>,pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance problems testing with Spamassassin 3.1.0
Date: 2005-07-31 18:19:11
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
Jim C. Nasby wrote:
> On Sun, Jul 31, 2005 at 08:51:06AM -0800, Matthew Schumacher wrote:
>>Ok, here is the current plan.
>>Change the spamassassin API to pass a hash of tokens into the storage
>>module, pass the tokens to the proc as an array, start a transaction,
>>load the tokens into a temp table using copy, select the tokens distinct
>>into the token table for new tokens, update the token table for known
>>tokens, then commit.
> You might consider:
> UPDATE tokens
>     FROM temp_table (this updates existing records)
> INSERT INTO tokens
>     SELECT ...
>     FROM temp_table
>     WHERE NOT IN (SELECT ... FROM tokens)
> This way you don't do an update to newly inserted tokens, which helps
> keep vacuuming needs in check.

The subselect might be quite a big set, so avoiding a full table scan 
and materialization by

DELETE temp_table
   WHERE key IN (select key FROM tokens JOIN temp_table);


SELECT temp_table.* FROM temp_table LEFT JOIN tokens USING (key)
WHERE tokens.key IS NULL

might be an additional win, assuming that only a small fraction of 
tokens is inserted and updated.


In response to

pgsql-performance by date

Next:From: William YuDate: 2005-07-31 21:11:58
Subject: Re: Performance problems on 4/8way Opteron (dualcore)
Previous:From: Jim C. NasbyDate: 2005-07-31 17:10:12
Subject: Re: Performance problems testing with Spamassassin 3.1.0

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group