Re: Serious performance problem

From: "Ross J(dot) Reedstrom" <reedstrm(at)rice(dot)edu>
To: "Tille, Andreas" <TilleA(at)rki(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Serious performance problem
Date: 2001-10-29 17:31:54
Message-ID: 20011029113154.B21886@rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andreas -
I took a look at your problem, since I'm sort of in the field,
and would liek to see free solutions spread, as well.

Here's what I see: Your example touches on what can be an achilles
heel for pgsql's current statistical analyzer: selection on data fields
that have a few common values. Often, the indices don't get used, since
a large fraction of the table needs to be scanned, in any case. In
your example, fully 68% of the table fits the where condition.

Here's some timing results on my machine:

Your dataset and query, as written:

real 0m25.272s
user 0m0.090s
sys 0m0.050s

Creating an index on meldekategorie, and forcing it's use with
"set enable_seqscan = off"

real 0m14.743s
user 0m0.070s
sys 0m0.050s

Same, with index on istaktuell:

real 0m26.511s
user 0m0.050s
sys 0m0.060s

Now, with an index on both meldekategorie and istaktuell:

real 0m7.179s
user 0m0.060s
sys 0m0.030s

I think we have a winner. No it's not sub-second, but I improved the time
by 3x just by trying some indices. Note that I _still_ had to force the
use of indices for this one. It's also the first time I've personally seen
a query/dataset that benefits this much from a two-key index.

As another poster replied to you, there is limitation with postgresql's
use of indices that arises from MVCC: even if the only data requested is
that stored in the index itself, the backend must visit the actual tuple
in the table to ensure that it is 'visible' to the current transaction.

How realistic a representation of your real workload is this query? Realize
that more selective, complex queries are where pgsql shines compared to
other RDBMS: the 'fast table scanner' type query that you proposed as your
test don't really let pgsql stretch it's legs. Do you have example timings
from MS-SQL or others?

Ross

On Mon, Oct 29, 2001 at 01:43:37PM +0100, Tille, Andreas wrote:
> Hello,
>
> I discussed a problem concerning the speed of PostgreSQL compared to
> MS SQL server heavily on postgres-general list. The thread starts with
> message
>
> http://fts.postgresql.org/db/mw/msg.html?mid=1035557
>
> Now I tried a snapshot of version 7.2 and got an increase of speed of
> about factor 2. But sorry this is really not enough. The very simple
> test I pointed to in my mail is even much to slow and the issue would
> probably spoil down the whole project which should be a complete open
> source solution and would perhaps and in any M$ stuff. I?ve got under
> heavy preasur from my employer who was talking about the nice world
> of MS .net (while he is using MS-SQL exclusively). To make the thing
> clear the issue is the gal database of infectious diseases in Germany
> runned by the Robert Koch-Institute. So the beast could be of some
> importance for increasing the acceptance of PostgreSQL and Open Source
> in the field of medicine which is generally known for the money which
> is involved in. So I really hope that some skilled programmers would
> be able to find a good way to solve the performance issue perhaps by
> just profiling the simple query
>
> SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz FROM Hauptdaten_Fall WHERE (((Hauptdaten_Fall.IstAktuell)=20))
> GROUP BY Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie;
>
> to the data set I put on
>
> http://www.foodborne-net.de/~tillea/ifsg/ifsgtest.dump.bz2
>
> If this should take less than half a second on a modern PC I could
> continue to try mo realistic queries.
>
> I really hope that I could readjust the image of PostgreSQL in the
> eyes of my M$-centered colleagues.
>
> Kind regards
>
> Andreas.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html

--
Ross Reedstrom, Ph.D. reedstrm(at)rice(dot)edu
Executive Director phone: 713-348-6166
Gulf Coast Consortium for Bioinformatics fax: 713-348-6182
Rice University MS-39
Houston, TX 77005

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2001-10-29 17:44:09 Re: ecpg - GRANT bug
Previous Message mlw 2001-10-29 17:30:15 Re: External Database Connection