Skip site navigation (1) Skip section navigation (2)

Odd select behavior -- statistics, redux (7.0.x and devel)

From: mlw <markw(at)mohawksoft(dot)com>
To: Hackers List <pgsql-hackers(at)postgresql(dot)org>
Subject: Odd select behavior -- statistics, redux (7.0.x and devel)
Date: 2000-11-30 12:26:27
Message-ID: 3A264773.B3CDA525@mohawksoft.com (view raw or flat)
Thread:
Lists: pgsql-hackers
When Postgres is fast, it is really fast. I love it. My biggest problem
is when/how it chooses best path, it seems to me that relatively few
records with a high duplication destroy performance. I can't stress
enough that this is a serious problem in the real world.

Take these two queries:

cdinfo=# explain select trackid, song, title   from zsong, ztitles where
ztitles.performer2 like 'Van Halen' and ztitles.muzenbr= zsong.muzenbr
and contains(song, 'panama', 10)>0;
NOTICE:  QUERY PLAN:
 
Merge Join  (cost=6012.55..182151.93 rows=10902 width=36)
  ->  Sort  (cost=6012.55..6012.55 rows=3130 width=16)
        ->  Index Scan using ztitles_performer2_ndx on ztitles 
(cost=0.00..5830.80 rows=3130 width=16)
  ->  Index Scan using zsong_muzenbr_ndx on zsong  (cost=0.00..166961.86
rows=731071 width=20)

cdinfo=# explain select trackid, song, title   from zsong, ztitles where
ztitles.title like 'Van Halen' and ztitles.muzenbr = zsong.muzenbr and
contains(song, 'panama', 10)>0;
NOTICE:  QUERY PLAN:
 
Nested Loop  (cost=0.00..93.45 rows=4 width=36)
  ->  Index Scan using ztitles_title_ndx on ztitles  (cost=0.00..7.08
rows=1 width=16)
  ->  Index Scan using zsong_muzenbr_ndx on zsong  (cost=0.00..78.43
rows=7 width=20)  

They are fundamentally the same query, each with an index, each doing
about the same thing. Except that the performer2 field has a high number
of duplicate records ala "Various Artists"

Now we have had some small debates about how to fix this, and perhaps I
am over simplifying it, but the current statistics are broken, they do
not work reliably and produce unreliable results. I think this is a must
for 7.1. A simple hack, such as discarding the upper and lower %5-10%
should be able to fix this behavior, without too many side effects (if
any). While I agree it is not the "right" way to do something, it would
be a "better" way of doing something that is currently wrong.

With the exception of this problem, I love postgres, but this problem
really goes a long way to make it look REAL bad.

BTW anyone know a way around this?

-- 
http://www.mohawksoft.com

pgsql-hackers by date

Next:From: Arno A. KarnerDate: 2000-11-30 13:04:34
Subject: compiling pg 7.0.3 on sco 5.0.5
Previous:From: Manish VigDate: 2000-11-30 10:51:21
Subject:

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group