Re: huge difference in performance between MS SQL and pg 8.3 on UPDATE with full text search

From: Ivan Sergio Borgonovo <mail(at)webthatworks(dot)it>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: huge difference in performance between MS SQL and pg 8.3 on UPDATE with full text search
Date: 2010-09-28 21:48:16
Message-ID: 20100928234816.03e18905@dawn.webthatworks.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, 28 Sep 2010 20:19:10 +0200
Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl> wrote:

> On 28 Sep 2010, at 12:49, Ivan Sergio Borgonovo wrote:
> > The hardware on the 2 machines is a bit different.
> > MS SQL 2005 runs on a box that is 2 years old, 2 SATA drives on
> > RAID 1 hw, 2 Xeon dual core (I can't check details right now)
> > PG runs on a box that has more than 5 years, 3 SCSI drives on
> > RAID 5 hw, 2 Xeon single core HT (Intel(R) Xeon(TM) CPU 3.20GHz,
> > cpu fam 14, model 4)

> > In both tables I've to update price.
> > VACUUM FULL was performed just before updating the prices.

> VACUUM FULL? Was that really necessary? You did REINDEX after
> that, didn't you? If not, your indexes became bloated. If the
> table wasn't empty before you probably meant to do a VACUUM
> ANALYSE, but if it was, just ANALYSE would have been sufficient.

...

> Did you ANALYSE between loading the data and updating?

I thought VACUUM FULL was more "magical" and implied a REINDEX.
Am I wrong?
The index that should be reindexed is the one on the pk, a simple
btree, that's not going to be as slow as rebuilding a gin... still
I'd really thought that VACUUM FULL implied a lot of things
(including ANALYZE as well).

> Also, executing the trigger on each copied line is likely to be a
> little slow, due to the overhead of calling a stored procedure
> (especially if it's plpgsql). It's probably quite a bit faster to
> disable the trigger and create the gin-index after loading the
> data.

I'll try to drop the trigger. I'm not expecting it the biggest
factor still adding something here and something there may end up in
the huge difference between the 2.
Anyway MS SQL seems to overcome all this nuisances auto-magically.

> An EXPLAIN ANALYSE of that statement would tell what it's doing
> and what's taking so long. A simple EXPLAIN would probably be
> sufficient to see what query plan it thinks it needs though.

I'll post the EXPLAIN. Before I run EXPLAIN ANALYZE I've to take
some precaution the DB doesn't explode.

> Did you tune that database? Several options (work_mem for example)
> could significantly improve your performance if you can set them
> higher (or reduce it if you set them too high). You can do that
> per session too.

pg is generally faster than the other MS SQL box on what's normally
done on a daily basis. Just large updates to the product page seems
to be a pain. Other INSERT/UPDATE operations are seldom performed,
they involve smaller tables with no gin index.

> > If HW can justify such huge difference I'll devote my time to
> > other problems.

> Partially, yes, but not that much I think.

That's my worry... but still in many circumstances pg performs
better than the MS SQL box... yeah... on pretty different
workload... but while on other workloads pg is a bit faster (20%
to 100% faster) even if it is on an older box, on this one is very
slow.

> > I'd say that a potential culprit could be the gin index. No
> > matter if the tsvector is updated or not, if the row is changed
> > I think the index is going to be updated anyway.

> gin indexes require relatively much RAM. If you didn't assign much
> in your settings then it's quite possible that the database can't
> keep the index in memory or that things have to spill to disk.
> Leave enough room for the OS's disk cache though, Postgres
> benefits from that as well.

> Is there any particular reason you went with a gin index and not a
> gist one? Gin can be faster, but consumes (much) more memory, but
> gist is also quite good with text searches and doesn't require
> quite as much memory.

gin index is doing a very good work and well full text searches are
the typical workload of that box and the one that is more important
to be fast.

I'd say if gin was occupying so much memory performances wouldn't be
so good on a daily basis.

I'd post excerpt of my postgres.conf (what's important here?) and
see if anything can be improved for *this* workload and temporary
tune the DB for this exceptional update still I'm going to continue
to be a bit surprised of such a huge difference even if it will come
out that it was actually a .conf not suited for this workload.

BTW the box is running Apache and php.

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2010-09-28 21:49:10 Re: pg_upgrade
Previous Message David Fetter 2010-09-28 21:03:15 Commitfest: The Good, The Bad, and the Ugly