bad selectivity estimates for CASE

From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: bad selectivity estimates for CASE
Date: 2009-01-06 03:15:29
Message-ID: 603c8f070901051915p15b3d6a7jbe6f6fdd178143e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

While looking at a complex query that is being poorly planned by
PostgreSQL 8.2.9, I discovered that any non-trivial CASE...WHEN
expression seems to produce a selectivity estimate of 0.005. This
also happens on HEAD.

psql (8.4devel)
Type "help" for help.

head=# create table tenk (c) as select generate_series(1,10000);
SELECT
head=# alter table tenk alter column c set statistics 100;
ALTER TABLE
head=# analyze tenk;
ANALYZE
head=# explain select * from tenk where c in (1,2,3,4);
QUERY PLAN
------------------------------------------------------
Seq Scan on tenk (cost=0.00..190.00 rows=4 width=4)
Filter: (c = ANY ('{1,2,3,4}'::integer[]))
(2 rows)

head=# explain select * from tenk where case when c in (1,2,3,4) then 1 end = 1;
QUERY PLAN

--------------------------------------------------------------------------------
------------
Seq Scan on tenk (cost=0.00..215.00 rows=50 width=4)
Filter: (CASE WHEN (c = ANY ('{1,2,3,4}'::integer[])) THEN 1 ELSE NULL::integ
er END = 1)
(2 rows)

head=# explain select * from tenk where case when c in (1,2,3,4) then 2 end = 1;
QUERY PLAN

--------------------------------------------------------------------------------
------------
Seq Scan on tenk (cost=0.00..215.00 rows=50 width=4)
Filter: (CASE WHEN (c = ANY ('{1,2,3,4}'::integer[])) THEN 2 ELSE NULL::integ
er END = 1)
(2 rows)

head=# \q

The last example is particularly egregious, since it can never return
true, but the previous example is not much better, since in my actual
query the actual selectivity (against a CASE with multiple WHEN
branches) can be as high as ~0.8, so a value of 0.005 isn't close. It
ends up causing a very expensive nested loop plan when something else
would be better.

Any suggestions would be appreciated.

...Robert

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2009-01-06 04:40:43 Re: bad selectivity estimates for CASE
Previous Message david 2009-01-01 20:40:11 Re: Poor plan choice in prepared statement