Bug in query planer ?

From: Clifford Wolf <clifford(dot)wolf(at)linbit(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Bug in query planer ?
Date: 2006-01-31 17:59:42
Message-ID: 200601311859.42586.clifford.wolf@linbit.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

philipp resiner wrote a mail about this problem yesterday. I've now traced it down to something that looks like a bug in the
query planer to me. Please have a look at this and let me know if this is a bug or I am compleatly wrong..

(this is done right after a complete ANALYZE over the database, so the statistics are up-to-date)

sd-beta=> select n_tup_ins, n_tup_del from pg_stat_user_tables where relname = 'contractelements';
n_tup_ins | n_tup_del
-----------+-----------
91821 | 0
(1 row)

sd-beta=> select n_distinct, most_common_vals, most_common_freqs from pg_stats where tablename = 'contractelements' and attname = 'isactiv';
n_distinct | most_common_vals | most_common_freqs
------------+------------------+----------------------
2 | {Y,N} | {0.966467,0.0335333}
(1 row)

sd-beta=> explain analyze select 1 from contractelements where isActiv = 'Y';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Seq Scan on contractelements (cost=0.00..4963.76 rows=88742 width=0) (actual time=0.014..137.930 rows=88838 loops=1)
Filter: ((isactiv)::text = 'Y'::text)
Total runtime: 153.543 ms
(3 rows)

The query planner estimates that isActiv = 'Y' will match 88742 rows. This is reasonable (91821 * 0.966467 = 88741.966407) and
correct. However, the following case causes some troubles:

sd-beta=> explain analyze select 1 from contractelements where upper(isActiv) = 'Y';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Seq Scan on contractelements (cost=0.00..5193.32 rows=459 width=0) (actual time=0.030..198.493 rows=88838 loops=1)
Filter: (upper((isactiv)::text) = 'Y'::text)
Total runtime: 214.035 ms
(3 rows)

Here we match on upper(isActiv) = 'Y' (which is totally braindead, but the query is auto-generated by a customer-supplied
application, so I can not change it). Shouldn't the query planner execute upper(isActiv) for both values in pg_stats
and so come to the same conclusion as in the first case?

It doesn't. Led by this misapprehension the query planner generates pretty creative, but unfortunately very suboptimal
query plans.

A 'CREATE INDEX clifford_temp ON contractelements ( upper(isActiv) )' followed by an 'ANALYZE contractelements' solves the
problem in this particular case. But this is not a solution to the problem in general..

Shouldn't the query planner be able to do the right thing without the index? Where does the magic 'rows=459' come from?

yours,
- clifford

--
: Clifford Wolf                      Tel +43-1-8178292-00 :
: LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austria    http://www.linbit.com :

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Juan Manuel Bigi 2006-01-31 18:55:27 BUG #2229: Parsing error in Functions Grant Wizard
Previous Message Michael Lush 2006-01-31 16:58:41 BUG #2228: escaped single quotes choke spi_exec_query in plperlu