Skip site navigation (1) Skip section navigation (2)

Re: Performance problems testing with Spamassassin

From: Karim Nassar <karim(dot)nassar(at)acm(dot)org>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Matthew Schumacher <matt(dot)s(at)aptalaska(dot)net>,Andrew McMillan <andrew(at)catalyst(dot)net(dot)nz>,Luke Lonergan <LLonergan(at)greenplum(dot)com>,pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance problems testing with Spamassassin
Date: 2005-07-30 00:39:41
Message-ID: 1122683983.11869.150.camel@localhost.localdomain (view raw or flat)
Thread:
Lists: pgsql-performance
On Fri, 2005-07-29 at 09:47 -0700, Josh Berkus wrote:
> Try changing:
> wal_buffers = 256
> 
> and try Bruce's stop full_page_writes patch.
> 
> > I guess we see the real culprit here.  Anyone surprised it's the WAL?
> 
> Nope.  On high-end OLTP stuff, it's crucial that the WAL have its own 
> dedicated disk resource.
> 
> Also, running a complex stored procedure for each and every word in each 
> e-mail is rather deadly ... with the e-mail traffic our server at Globix 
> receives, for example, that would amount to running it about 1,000 times a 
> minute.  

Is this a real-world fix? Seems to me that Spam Assassin runs on a
plethora of mail servers, and optimizing his/her/my/your pg config
doesn't solve the root problem: there are thousands of (seemingly)
high-overhead function calls being executed. 


> It would be far better to batch this, somehow, maybe using temp 
> tables.

Agreed. On my G4 laptop running the default configured Ubuntu Linux
postgresql 7.4.7 package, it took 43 minutes for Matthew's script to run
(I ran it twice just to be sure). In my spare time over the last day, I
created a brute force perl script that took under 6 minutes. Am I on to
something, or did I just optimize for *my* system?

http://ccl.cens.nau.edu/~kan4/files/k-bayesBenchmark.tar.gz

kan4(at)slap-happy:~/k-bayesBenchmark$ time ./test.pl
<-- snip db creation stuff -->
17:18:44 -- START
17:19:37 -- AFTER TEMP LOAD : loaded 120596 records
17:19:46 -- AFTER bayes_token INSERT : inserted 49359 new records into bayes_token
17:19:50 -- AFTER bayes_vars UPDATE : updated 1 records
17:23:37 -- AFTER bayes_token UPDATE : updated 47537 records
DONE

real    5m4.551s
user    0m29.442s
sys     0m3.925s


I am sure someone smarter could optimize further.

Anyone with a super-spifty machine wanna see if there is an improvement
here?

-- 
Karim Nassar <karim(dot)nassar(at)acm(dot)org>


In response to

Responses

pgsql-performance by date

Next:From: William YuDate: 2005-07-30 07:57:38
Subject: Re: Performance problems on 4/8way Opteron (dualcore)
Previous:From: DarioDate: 2005-07-29 23:18:56
Subject: Re: Left joining against two empty tables makes a query

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group