Re: Performance problems testing with Spamassassin 3.1.0

From: John Arbash Meinel <john(at)arbash-meinel(dot)com>
To: John Arbash Meinel <john(at)arbash-meinel(dot)com>
Cc: Matthew Schumacher <matt(dot)s(at)aptalaska(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance problems testing with Spamassassin 3.1.0
Date: 2005-07-31 15:52:14
Message-ID: 42ECF3AE.6060809@arbash-meinel.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

John Arbash Meinel wrote:

>Matthew Schumacher wrote:
>
>
>
>>All it's doing is trying the update before the insert to get around the
>>problem of not knowing which is needed. With only 2-3 of the queries
>>implemented I'm already back to running about the same speed as the
>>original SA proc that is going to ship with SA 3.1.0.
>>
>>All of the queries are using indexes so at this point I'm pretty
>>convinced that the biggest problem is the sheer number of queries
>>required to run this proc 200 times for each email (once for each token).
>>
>>I don't see anything that could be done to make this much faster on the
>>postgres end, it's looking like the solution is going to involve cutting
>>down the number of queries some how.
>>
>>One thing that is still very puzzling to me is why this runs so much
>>slower when I put the data.sql in a transaction. Obviously transactions
>>are acting different when you call a proc a zillion times vs an insert
>>query.
>>
>>
>>
>>
>Well, I played with adding a COMMIT;BEGIN; statement to your exact test
>every 1000 lines. And this is what I got:
>
>
Just for reference, I also tested this on my old server, which is a dual
Celeron 450 with 256M ram. FC4 and Postgres 8.0.3
Unmodified:
real 54m15.557s
user 0m24.328s
sys 0m14.200s

With Transactions every 1000 selects, and vacuum every 5000:
real 8m36.528s
user 0m16.585s
sys 0m12.569s

With Transactions every 1000 selects, and vacuum every 10000:
real 7m50.748s
user 0m16.183s
sys 0m12.489s

On this machine vacuum is more expensive, since it doesn't have as much ram.

Anyway, on this machine, I see approx 7x improvement. Which I think is
probably going to satisfy your spamassassin needs.
John
=:->

PS> Looking forward to having a spamassassin that can utilize my
favorite db. Right now, I'm not using a db backend because it wasn't
worth setting up mysql.

>Unmodified:
>real 17m53.587s
>user 0m6.204s
>sys 0m3.556s
>
>With BEGIN/COMMIT:
>real 1m53.466s
>user 0m5.203s
>sys 0m3.211s
>
>So I see the potential for improvement almost 10 fold by switching to
>transactions. I played with the perl script (and re-implemented it in
>python), and for the same data as the perl script, using COPY instead of
>INSERT INTO means 5s instead of 33s.
>
>I also played around with adding VACUUM ANALYZE every 10 COMMITS, which
>brings the speed to:
>
>real 1m41.258s
>user 0m5.394s
>sys 0m3.212s
>
>And doing VACUUM ANALYZE every 5 COMMITS makes it:
>real 1m46.403s
>user 0m5.597s
>sys 0m3.244s
>
>I'm assuming the slowdown is because of the extra time spent vacuuming.
>Overall performance might still be improving, since you wouldn't
>actually be inserting all 100k rows at once.
>
>
...

>This is all run on Ubuntu, with postgres 7.4.7, and a completely
>unchanged postgresql.conf. (But the machine is a dual P4 2.4GHz, with
>3GB of RAM).
>
>John
>=:->
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Matthew Schumacher 2005-07-31 16:51:06 Re: Performance problems testing with Spamassassin 3.1.0
Previous Message J. Andrew Rogers 2005-07-31 15:29:15 Re: Performance problems on 4/8way Opteron (dualcore)