Loose Index Scans by Planner?

From: Shaun Thomas <sthomas(at)optionshouse(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Loose Index Scans by Planner?
Date: 2012-08-24 16:20:21
Message-ID: 5037A9C5.4030701@optionshouse.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Maybe I should post this in Hackers instead, but I figured I'd start
here to avoid cluttering up that list.

So, we know we have a way of doing a loose index scan with CTEs:

http://wiki.postgresql.org/wiki/Loose_indexscan

But that got me wondering. The planner knows from pg_stats that col1
could have low cardinality. If that's the case, and a WHERE clause uses
a two column index, and col2 is specified, why can't it walk each
individual bucket in the two-column index, and use col2? So I forced
such a beast with a CTE:

WITH RECURSIVE t AS (
SELECT min(col1) AS col1
FROM tablename
UNION ALL
SELECT (SELECT min(col1)
FROM tablename
WHERE col1 > t.col1)
FROM t
WHERE t.col1 IS NOT NULL
)
SELECT p.*
FROM t
JOIN tablename p USING (col1)
where p.col2 = 12345

I ask, because while the long-term fix would be to re-order the index to
(col2, col1), this seems like a situation the planner could easily
detect and compensate for. In our particular example, execution time
went from 160ms to 2ms with the CTE rewrite. This is a contrived
example, but it seems like loose index scans would be useful in other
ways. Heck, this:

SELECT DISTINCT col1
FROM tablename;

Has terrible performance because it always seems to revert to a sequence
scan, but it's something people do *all the time*. I can't reasonably
expect all of my devs to switch to that admittedly gross CTE to get a
faster effect, so I'm just thinking out loud.

Until PG puts in something to fix this, I plan on writing a stored
procedure that writes a dynamic CTE and returns a corresponding result
set. It's not ideal, but it would solve our particular itch. Really,
this should be possible with any indexed column, so I might abstract it.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas(at)optionshouse(dot)com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-08-24 17:13:45 Re: default_isolation_level='serializable' crashes on Windows
Previous Message Heikki Linnakangas 2012-08-24 15:51:20 Re: Statistics and selectivity estimation for ranges

Browse pgsql-performance by date

  From Date Subject
Next Message Artur Zając 2012-08-24 18:46:42 NOTIFY performance
Previous Message Felix Schubert 2012-08-24 09:47:45 Slow Performance on a XEON E5504