Re: Performance problems testing with Spamassassin

From: Karim Nassar <karim(dot)nassar(at)acm(dot)org>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Matthew Schumacher <matt(dot)s(at)aptalaska(dot)net>, Andrew McMillan <andrew(at)catalyst(dot)net(dot)nz>, Luke Lonergan <LLonergan(at)greenplum(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance problems testing with Spamassassin
Date: 2005-07-30 00:39:41
Message-ID: 1122683983.11869.150.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, 2005-07-29 at 09:47 -0700, Josh Berkus wrote:
> Try changing:
> wal_buffers = 256
>
> and try Bruce's stop full_page_writes patch.
>
> > I guess we see the real culprit here. Anyone surprised it's the WAL?
>
> Nope. On high-end OLTP stuff, it's crucial that the WAL have its own
> dedicated disk resource.
>
> Also, running a complex stored procedure for each and every word in each
> e-mail is rather deadly ... with the e-mail traffic our server at Globix
> receives, for example, that would amount to running it about 1,000 times a
> minute.

Is this a real-world fix? Seems to me that Spam Assassin runs on a
plethora of mail servers, and optimizing his/her/my/your pg config
doesn't solve the root problem: there are thousands of (seemingly)
high-overhead function calls being executed.

> It would be far better to batch this, somehow, maybe using temp
> tables.

Agreed. On my G4 laptop running the default configured Ubuntu Linux
postgresql 7.4.7 package, it took 43 minutes for Matthew's script to run
(I ran it twice just to be sure). In my spare time over the last day, I
created a brute force perl script that took under 6 minutes. Am I on to
something, or did I just optimize for *my* system?

http://ccl.cens.nau.edu/~kan4/files/k-bayesBenchmark.tar.gz

kan4(at)slap-happy:~/k-bayesBenchmark$ time ./test.pl
<-- snip db creation stuff -->
17:18:44 -- START
17:19:37 -- AFTER TEMP LOAD : loaded 120596 records
17:19:46 -- AFTER bayes_token INSERT : inserted 49359 new records into bayes_token
17:19:50 -- AFTER bayes_vars UPDATE : updated 1 records
17:23:37 -- AFTER bayes_token UPDATE : updated 47537 records
DONE

real 5m4.551s
user 0m29.442s
sys 0m3.925s

I am sure someone smarter could optimize further.

Anyone with a super-spifty machine wanna see if there is an improvement
here?

--
Karim Nassar <karim(dot)nassar(at)acm(dot)org>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message William Yu 2005-07-30 07:57:38 Re: Performance problems on 4/8way Opteron (dualcore)
Previous Message Dario 2005-07-29 23:18:56 Re: Left joining against two empty tables makes a query