John Arbash Meinel wrote:
>Matthew Schumacher wrote:
>>All it's doing is trying the update before the insert to get around the
>>problem of not knowing which is needed. With only 2-3 of the queries
>>implemented I'm already back to running about the same speed as the
>>original SA proc that is going to ship with SA 3.1.0.
>>All of the queries are using indexes so at this point I'm pretty
>>convinced that the biggest problem is the sheer number of queries
>>required to run this proc 200 times for each email (once for each token).
>>I don't see anything that could be done to make this much faster on the
>>postgres end, it's looking like the solution is going to involve cutting
>>down the number of queries some how.
>>One thing that is still very puzzling to me is why this runs so much
>>slower when I put the data.sql in a transaction. Obviously transactions
>>are acting different when you call a proc a zillion times vs an insert
>Well, I played with adding a COMMIT;BEGIN; statement to your exact test
>every 1000 lines. And this is what I got:
Just for reference, I also tested this on my old server, which is a dual
Celeron 450 with 256M ram. FC4 and Postgres 8.0.3
With Transactions every 1000 selects, and vacuum every 5000:
With Transactions every 1000 selects, and vacuum every 10000:
On this machine vacuum is more expensive, since it doesn't have as much ram.
Anyway, on this machine, I see approx 7x improvement. Which I think is
probably going to satisfy your spamassassin needs.
PS> Looking forward to having a spamassassin that can utilize my
favorite db. Right now, I'm not using a db backend because it wasn't
worth setting up mysql.
>So I see the potential for improvement almost 10 fold by switching to
>transactions. I played with the perl script (and re-implemented it in
>python), and for the same data as the perl script, using COPY instead of
>INSERT INTO means 5s instead of 33s.
>I also played around with adding VACUUM ANALYZE every 10 COMMITS, which
>brings the speed to:
>And doing VACUUM ANALYZE every 5 COMMITS makes it:
>I'm assuming the slowdown is because of the extra time spent vacuuming.
>Overall performance might still be improving, since you wouldn't
>actually be inserting all 100k rows at once.
>This is all run on Ubuntu, with postgres 7.4.7, and a completely
>unchanged postgresql.conf. (But the machine is a dual P4 2.4GHz, with
>3GB of RAM).
In response to
pgsql-performance by date
|Next:||From: Matthew Schumacher||Date: 2005-07-31 16:51:06|
|Subject: Re: Performance problems testing with Spamassassin 3.1.0|
|Previous:||From: J. Andrew Rogers||Date: 2005-07-31 15:29:15|
|Subject: Re: Performance problems on 4/8way Opteron (dualcore)|