Should the optimiser convert a CASE into a WHERE if it can?

From: Richard Neill <rn214(at)cam(dot)ac(dot)uk>
To: pgsql-performance(at)postgresql(dot)org
Subject: Should the optimiser convert a CASE into a WHERE if it can?
Date: 2010-01-26 17:10:26
Message-ID: 4B5F2202.3050107@cam.ac.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Dear All,

Just wondering whether there is a missing scope for the query planner
(on 8.4.2) to be cleverer than it currently is.

Specifically, I wonder whether the optimiser should know that by
converting a CASE condition into a WHERE condition, it can use an index.

Have I found a possible enhancement, or is this simply too hard to do?

Best wishes,

Richard

Example:
--------

In this example, tbl_tracker has 255751 rows, with a primary key "id",
whose values lie uniformly in the range 1...1255750.

If one is trying to count multiple conditions, the following query seems
to be the most obvious way to do it:

SELECT
SUM (case when id > 1200000 and id < 1210000 then 1 else 0 end) AS c1,
SUM (case when id > 1210000 and id < 1220000 then 1 else 0 end) AS c2,
SUM (case when id > 1220000 and id < 1230000 then 1 else 0 end) AS c3,
SUM (case when id > 1230000 and id < 1240000 then 1 else 0 end) AS c4,
SUM (case when id > 1240000 and id < 1250000 then 1 else 0 end) AS c5
FROM tbl_tracker;

c1 | c2 | c3 | c4 | c5
------+------+------+------+------
2009 | 2018 | 2099 | 2051 | 2030

Time: 361.666 ms

This can be manually optimised into a far uglier (but much much faster)
query:

SELECT * FROM
(SELECT COUNT (1) AS c1 FROM tbl_tracker
WHERE id > 1200000 and id < 1210000) AS s1,
(SELECT COUNT (1) AS c2 FROM tbl_tracker
WHERE id > 1210000 and id < 1220000) AS s2,
(SELECT COUNT (1) AS c3 FROM tbl_tracker
WHERE id > 1220000 and id < 1230000) AS s3,
(SELECT COUNT (1) AS c4 FROM tbl_tracker
WHERE id > 1230000 and id < 1240000) AS s4,
(SELECT COUNT (1) AS c5 FROM tbl_tracker
WHERE id > 1240000 and id < 1250000) AS s5

c1 | c2 | c3 | c4 | c5
------+------+------+------+------
2009 | 2018 | 2099 | 2051 | 2030
(1 row)

Time: 21.091 ms

Debugging
---------

The simple queries are:

SELECT SUM (case when id > 1200000 and id < 1210000 then 1 else 0 end)
from tbl_tracker;

Time: 174.804 ms

Explain shows that this does a sequential scan.

SELECT COUNT(1) from tbl_tracker WHERE id > 1200000 and id < 1210000;

Time: 4.153 ms

Explain shows that this uses the index, as expected.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2010-01-26 17:21:38 Re: Should the optimiser convert a CASE into a WHERE if it can?
Previous Message Tom Lane 2010-01-26 16:41:54 Re: Poor query plan across OR operator