Re: DISTINCT with btree skip scan

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: DISTINCT with btree skip scan
Date: 2016-11-23 21:19:37
Message-ID: CAEepm=2r5zqNfNvJ6fRoPX9K9nqtPqNUTKG_OucNeBMnaaB9gQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 12, 2016 at 4:19 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> Here it is, now split into two parts: one patch
> to add an amskip operation, and another to consider using it to
> implement DISTINCT when it was already has an index only scan path on
> an index that supports skipping.

Those patches add an amskip operation and then teach the planner and
executor how to do this, given a table foo (a, b) with an index on (a)
or (a, b):

SELECT DISTINCT foo FROM a

Index Only Scan for distinct values of (a) using foo_pkey on foo

In just the right circumstances that is vastly faster than scanning
every tuple and uniquifying. I was thinking that the next step in
this experiment might be to introduce a kind of plan that can handle
queries where the index's leading column is not in the WHERE clause,
building on the above plan, and behaving conceptually a little bit
like a Nested Loop, like so:

SELECT * FROM foo WHERE b = 42

Index Skip Scan
-> Index Only Scan for distinct values of (a) using foo_pkey on foo
-> Index Only Scan using foo_pkey on foo
Index Cond: ((a = outer.a) AND (b = 42))

I suppose you could instead have a single node that knows how to
perform the whole scan itself so that you don't finish up searching
for each distinct value in both the inner and outer subplan, but it
occurred to me that building the plan out of Lego bricks like this
might generalise better to more complicated queries. I suppose it
might also parallelise nicely with a Parallel Index Only Scan as the
outer plan.

Maybe something similar is possible for handling "GROUP BY a".

Worth pursuing? Does amskip suck? Does anyone have better ideas,
either for how to do the low level skip or the higher level Index Skip
Scan, or perhaps a completely different way of looking at this?

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-11-23 21:28:18 Re: Calculation of param_source_rels in add_paths_to_joinrel
Previous Message Maeldron T. 2016-11-23 19:45:30 Re: Google Cloud Compute + FreeBSD + PostgreSQL = timecounter issue