Re: Loose Index Scans by Planner?

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <sthomas(at)optionshouse(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Loose Index Scans by Planner?
Date: 2012-08-24 19:40:57
Message-ID: 503792790200002500049B56@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Shaun Thomas <sthomas(at)optionshouse(dot)com> wrote:

> So, we know we have a way of doing a loose index scan with CTEs:
>
> http://wiki.postgresql.org/wiki/Loose_indexscan

I tried this on a table in production with 23 million rows for a
column with 45 distinct values which is the high-order column of a
four-column index. This ran in 445 ms first time and 2 ms on the
second and subsequent tries. The equivalent SELECT DISTINCT ran in
30 seconds first time, and got down to 11.5 seconds after a few
runs. So roughly two orders of magnitude faster with a cold cache
and three orders of magnitude faster with a warm cache.

That sure would be a nice optimization to have in the planner.

> But that got me wondering. The planner knows from pg_stats that
> col1 could have low cardinality. If that's the case, and a WHERE
> clause uses a two column index, and col2 is specified, why can't
> it walk each individual bucket in the two-column index, and use
> col2? So I forced such a beast with a CTE:
>
> WITH RECURSIVE t AS (
> SELECT min(col1) AS col1
> FROM tablename
> UNION ALL
> SELECT (SELECT min(col1)
> FROM tablename
> WHERE col1 > t.col1)
> FROM t
> WHERE t.col1 IS NOT NULL
> )
> SELECT p.*
> FROM t
> JOIN tablename p USING (col1)
> where p.col2 = 12345
>
> I ask, because while the long-term fix would be to re-order the
> index to (col2, col1), this seems like a situation the planner
> could easily detect and compensate for. In our particular example,
> execution time went from 160ms to 2ms with the CTE rewrite.

Well, that'd be the icing on the cake. I'd be overjoyed to get the
cake. :-)

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2012-08-24 21:16:15 PATCH: pgbench - random sampling of transaction written into log
Previous Message Brendan Byrd 2012-08-24 18:24:48 Minor "pre-bug" in gram.y for DROP INDEX CONCURRENTLY IF_P EXISTS

Browse pgsql-performance by date

  From Date Subject
Next Message Artur Zając 2012-08-24 21:11:08 Re: NOTIFY performance
Previous Message Merlin Moncure 2012-08-24 19:12:05 Re: NOTIFY performance