Re: huge difference in performance between MS SQL and pg 8.3 on UPDATE with full text search

From: Allan Kamau <kamauallan(at)gmail(dot)com>
To: Ivan Sergio Borgonovo <mail(at)webthatworks(dot)it>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: huge difference in performance between MS SQL and pg 8.3 on UPDATE with full text search
Date: 2010-09-28 11:35:09
Message-ID: AANLkTimxpMyH1RuZhf0Nu+yebUpPjHEeX5BeN99GaaFU@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Sep 28, 2010 at 1:49 PM, Ivan Sergio Borgonovo
<mail(at)webthatworks(dot)it> wrote:
> I know I'm comparing apples and orange but still the difference in
> performance was quite astonishing.
>
> I've 2 tables that look like:
>
> create table products(
>  id bigint
>  price double precision, /* legacy, don't ask */
>  sometextfield1 varchar(128),
>  sometextfield2 varchar(128),
>  ...
> );
>
> one on a MS SQL 2005 and another one on pg 8.3.
>
> MS SQL has full text search on the text fields (I don't know the
> details).
>
> pg product table has a tsvector field and a gin index defined on it +
> trigger that update the tsvector field when the textfields change.
> The trigger is made in a way that it actually update the tsvector
> just if the text fields are changed.
>
> The hardware on the 2 machines is a bit different.
> MS SQL 2005 runs on a box that is 2 years old, 2 SATA drives on RAID
> 1 hw, 2 Xeon dual core (I can't check details right now)
> PG runs on a box that has more than 5 years, 3 SCSI drives on RAID 5
> hw, 2 Xeon single core HT (Intel(R) Xeon(TM) CPU 3.20GHz, cpu fam
> 14, model 4)
>
> Both have 4Gb of ram.
> shared_buffers is 240Mb.
> Both share a similar workload.
> Both boxes were in the same "price class" when they were bought.
>
> In both tables I've to update price.
> VACUUM FULL was performed just before updating the prices.
>
> MS SQL receives a large sql file that contain all the UPDATE
> statements.
> PG receive a csv file that is loaded into a table with COPY and then
> does the update as
> update products set price=p.price from temp_price where id=p.id and
> price<>p.price;
>
> MS SQL ingurgitate the whole sql file in around 10sec.
> pg takes more than 5 min to just run the single update statement.
>
> I'd like to know if such a large difference can be justified just by
> HW difference or by a difference in the process on how data are
> loaded [1] or by the difference in performance of the 2 servers on
> this kind of workload or by some postgres config before I decide how
> to manage my time to redesign the import procedure.
>
> If HW can justify such huge difference I'll devote my time to other
> problems.
>
> I'd say that a potential culprit could be the gin index. No matter
> if the tsvector is updated or not, if the row is changed I think the
> index is going to be updated anyway.
> Somehow MS SQL circumvent this problem, possibly by building the
> equivalent of a tsvector column in a "hidden" table that
> automatically join to the "text" table.
> This add a join but reduce the cost of table modification since
> simpler (btree) indexes are faster to update.
>
> Still huge updates are rare and that table is mostly read and very
> rarely written. During unusually huge updates I may consider to drop
> the gin index.
>
> [1] I'd expect that excluding the time it takes to load the csv a
> single update should run faster than a huge list of single statement
> update
>
> --
> Ivan Sergio Borgonovo
> http://www.webthatworks.it
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Restarting the computer on which PG is running may help. I have access
to a server running PG 8.4 on Ubuntu and I have noticed that after a
day of intense use the PG slows down significantly, "free -g" reports
almost no free memory available (something seems to leak memory on
this Ubuntu box). But when I restart the OS (Ubuntu), the PG executes
my queries in good time. I seem not to have similar problems on the
other servers running Fedora 12 and 13. But it could be my
configuration(s) on the Ubuntu box at fault, I am still investigating.

Allan.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Vick Khera 2010-09-28 12:15:31 Re: Scaling PostgreSQL-9
Previous Message Ivan Sergio Borgonovo 2010-09-28 10:49:03 huge difference in performance between MS SQL and pg 8.3 on UPDATE with full text search