Skip site navigation (1) Skip section navigation (2)

Re: Performance problems testing with Spamassassin 3.1.0

From: John Arbash Meinel <john(at)arbash-meinel(dot)com>
To: John Arbash Meinel <john(at)arbash-meinel(dot)com>
Cc: Matthew Schumacher <matt(dot)s(at)aptalaska(dot)net>,pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance problems testing with Spamassassin 3.1.0
Date: 2005-07-31 15:52:14
Message-ID: 42ECF3AE.6060809@arbash-meinel.com (view raw or flat)
Thread:
Lists: pgsql-performance
John Arbash Meinel wrote:

>Matthew Schumacher wrote:
>
>  
>
>>All it's doing is trying the update before the insert to get around the
>>problem of not knowing which is needed.  With only 2-3 of the queries
>>implemented I'm already back to running about the same speed as the
>>original SA proc that is going to ship with SA 3.1.0.
>>
>>All of the queries are using indexes so at this point I'm pretty
>>convinced that the biggest problem is the sheer number of queries
>>required to run this proc 200 times for each email (once for each token).
>>
>>I don't see anything that could be done to make this much faster on the
>>postgres end, it's looking like the solution is going to involve cutting
>>down the number of queries some how.
>>
>>One thing that is still very puzzling to me is why this runs so much
>>slower when I put the data.sql in a transaction.  Obviously transactions
>>are acting different when you call a proc a zillion times vs an insert
>>query.
>> 
>>
>>    
>>
>Well, I played with adding a COMMIT;BEGIN; statement to your exact test
>every 1000 lines. And this is what I got:
>  
>
Just for reference, I also tested this on my old server, which is a dual
Celeron 450 with 256M ram. FC4 and Postgres 8.0.3
Unmodified:
real    54m15.557s
user    0m24.328s
sys     0m14.200s

With Transactions every 1000 selects, and vacuum every 5000:
real    8m36.528s
user    0m16.585s
sys     0m12.569s

With Transactions every 1000 selects, and vacuum every 10000:
real    7m50.748s
user    0m16.183s
sys     0m12.489s

On this machine vacuum is more expensive, since it doesn't have as much ram.

Anyway, on this machine, I see approx 7x improvement. Which I think is
probably going to satisfy your spamassassin needs.
John
=:->

PS> Looking forward to having a spamassassin that can utilize my
favorite db. Right now, I'm not using a db backend because it wasn't
worth setting up mysql.

>Unmodified:
>real    17m53.587s
>user    0m6.204s
>sys     0m3.556s
>
>With BEGIN/COMMIT:
>real    1m53.466s
>user    0m5.203s
>sys     0m3.211s
>
>So I see the potential for improvement almost 10 fold by switching to
>transactions. I played with the perl script (and re-implemented it in
>python), and for the same data as the perl script, using COPY instead of
>INSERT INTO means 5s instead of 33s.
>
>I also played around with adding VACUUM ANALYZE every 10 COMMITS, which
>brings the speed to:
>
>real    1m41.258s
>user    0m5.394s
>sys     0m3.212s
>
>And doing VACUUM ANALYZE every 5 COMMITS makes it:
>real    1m46.403s
>user    0m5.597s
>sys     0m3.244s
>
>I'm assuming the slowdown is because of the extra time spent vacuuming.
>Overall performance might still be improving, since you wouldn't
>actually be inserting all 100k rows at once.
>  
>
...

>This is all run on Ubuntu, with postgres 7.4.7, and a completely
>unchanged postgresql.conf. (But the machine is a dual P4 2.4GHz, with
>3GB of RAM).
>
>John
>=:->
>  
>

In response to

Responses

pgsql-performance by date

Next:From: Matthew SchumacherDate: 2005-07-31 16:51:06
Subject: Re: Performance problems testing with Spamassassin 3.1.0
Previous:From: J. Andrew RogersDate: 2005-07-31 15:29:15
Subject: Re: Performance problems on 4/8way Opteron (dualcore)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group