Re: huge difference in performance between MS SQL and pg 8.3 on UPDATE with full text search

From: Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>
To: Ivan Sergio Borgonovo <mail(at)webthatworks(dot)it>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: huge difference in performance between MS SQL and pg 8.3 on UPDATE with full text search
Date: 2010-09-28 18:19:10
Message-ID: 6911BFC8-8535-407D-B1A8-F94D459FBB42@solfertje.student.utwente.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 28 Sep 2010, at 12:49, Ivan Sergio Borgonovo wrote:
> The hardware on the 2 machines is a bit different.
> MS SQL 2005 runs on a box that is 2 years old, 2 SATA drives on RAID
> 1 hw, 2 Xeon dual core (I can't check details right now)
> PG runs on a box that has more than 5 years, 3 SCSI drives on RAID 5
> hw, 2 Xeon single core HT (Intel(R) Xeon(TM) CPU 3.20GHz, cpu fam
> 14, model 4)

RAID-5 isn't ideal for databases, the RAID-1 in the other box is probably faster. Especially since it's on more modern hardware.

> In both tables I've to update price.
> VACUUM FULL was performed just before updating the prices.

VACUUM FULL? Was that really necessary? You did REINDEX after that, didn't you? If not, your indexes became bloated.
If the table wasn't empty before you probably meant to do a VACUUM ANALYSE, but if it was, just ANALYSE would have been sufficient.

> MS SQL receives a large sql file that contain all the UPDATE
> statements.
> PG receive a csv file that is loaded into a table with COPY and then
> does the update as
> update products set price=p.price from temp_price where id=p.id and
> price<>p.price;

Did you ANALYSE between loading the data and updating?

Also, executing the trigger on each copied line is likely to be a little slow, due to the overhead of calling a stored procedure (especially if it's plpgsql). It's probably quite a bit faster to disable the trigger and create the gin-index after loading the data.

> MS SQL ingurgitate the whole sql file in around 10sec.
> pg takes more than 5 min to just run the single update statement.

An EXPLAIN ANALYSE of that statement would tell what it's doing and what's taking so long. A simple EXPLAIN would probably be sufficient to see what query plan it thinks it needs though.

Since an update means PG has to insert and delete records, the fact that the database is on RAID-5 is probably a factor here. How much? No idea.

> I'd like to know if such a large difference can be justified just by
> HW difference or by a difference in the process on how data are
> loaded [1] or by the difference in performance of the 2 servers on
> this kind of workload or by some postgres config before I decide how
> to manage my time to redesign the import procedure.

Did you tune that database? Several options (work_mem for example) could significantly improve your performance if you can set them higher (or reduce it if you set them too high). You can do that per session too.

> If HW can justify such huge difference I'll devote my time to other
> problems.

Partially, yes, but not that much I think.

> I'd say that a potential culprit could be the gin index. No matter
> if the tsvector is updated or not, if the row is changed I think the
> index is going to be updated anyway.

gin indexes require relatively much RAM. If you didn't assign much in your settings then it's quite possible that the database can't keep the index in memory or that things have to spill to disk. Leave enough room for the OS's disk cache though, Postgres benefits from that as well.

Is there any particular reason you went with a gin index and not a gist one? Gin can be faster, but consumes (much) more memory, but gist is also quite good with text searches and doesn't require quite as much memory.

> [1] I'd expect that excluding the time it takes to load the csv a
> single update should run faster than a huge list of single statement
> update

Correct.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4ca231ae678301692839670!

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Larry Leszczynski 2010-09-28 18:23:23 Re: PostgreSQL 9 Mac OS X one-click install - PL/perl broken
Previous Message Eric McKeeth 2010-09-28 18:18:12 Re: Exclusion constraint issue