I took a look at your problem, since I'm sort of in the field,
and would liek to see free solutions spread, as well.
Here's what I see: Your example touches on what can be an achilles
heel for pgsql's current statistical analyzer: selection on data fields
that have a few common values. Often, the indices don't get used, since
a large fraction of the table needs to be scanned, in any case. In
your example, fully 68% of the table fits the where condition.
Here's some timing results on my machine:
Your dataset and query, as written:
Creating an index on meldekategorie, and forcing it's use with
"set enable_seqscan = off"
Same, with index on istaktuell:
Now, with an index on both meldekategorie and istaktuell:
I think we have a winner. No it's not sub-second, but I improved the time
by 3x just by trying some indices. Note that I _still_ had to force the
use of indices for this one. It's also the first time I've personally seen
a query/dataset that benefits this much from a two-key index.
As another poster replied to you, there is limitation with postgresql's
use of indices that arises from MVCC: even if the only data requested is
that stored in the index itself, the backend must visit the actual tuple
in the table to ensure that it is 'visible' to the current transaction.
How realistic a representation of your real workload is this query? Realize
that more selective, complex queries are where pgsql shines compared to
other RDBMS: the 'fast table scanner' type query that you proposed as your
test don't really let pgsql stretch it's legs. Do you have example timings
from MS-SQL or others?
On Mon, Oct 29, 2001 at 01:43:37PM +0100, Tille, Andreas wrote:
> I discussed a problem concerning the speed of PostgreSQL compared to
> MS SQL server heavily on postgres-general list. The thread starts with
> Now I tried a snapshot of version 7.2 and got an increase of speed of
> about factor 2. But sorry this is really not enough. The very simple
> test I pointed to in my mail is even much to slow and the issue would
> probably spoil down the whole project which should be a complete open
> source solution and would perhaps and in any M$ stuff. I?ve got under
> heavy preasur from my employer who was talking about the nice world
> of MS .net (while he is using MS-SQL exclusively). To make the thing
> clear the issue is the gal database of infectious diseases in Germany
> runned by the Robert Koch-Institute. So the beast could be of some
> importance for increasing the acceptance of PostgreSQL and Open Source
> in the field of medicine which is generally known for the money which
> is involved in. So I really hope that some skilled programmers would
> be able to find a good way to solve the performance issue perhaps by
> just profiling the simple query
> SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz FROM Hauptdaten_Fall WHERE (((Hauptdaten_Fall.IstAktuell)=20))
> GROUP BY Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie;
> to the data set I put on
> If this should take less than half a second on a modern PC I could
> continue to try mo realistic queries.
> I really hope that I could readjust the image of PostgreSQL in the
> eyes of my M$-centered colleagues.
> Kind regards
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
Ross Reedstrom, Ph.D. reedstrm(at)rice(dot)edu
Executive Director phone: 713-348-6166
Gulf Coast Consortium for Bioinformatics fax: 713-348-6182
Rice University MS-39
Houston, TX 77005
In response to
pgsql-hackers by date
|Next:||From: Bruce Momjian||Date: 2001-10-29 17:44:09|
|Subject: Re: ecpg - GRANT bug|
|Previous:||From: mlw||Date: 2001-10-29 17:30:15|
|Subject: Re: External Database Connection|