Re: SELECT DISTINCT never uses an index?

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bill Moran <wmoran(at)potentialtech(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SELECT DISTINCT never uses an index?
Date: 2016-07-07 22:06:43
Message-ID: CAEepm=25w4tcignLyNJnfAqz4gudi89DeXgiquU1-kuF2w5Cow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 8, 2016 at 9:49 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Jul 7, 2016 at 4:56 PM, Bill Moran <wmoran(at)potentialtech(dot)com> wrote:
>> SELECT DISTINCT size FROM grue;
>>
>> Always does a seq scan on Postgres 9.5.2. (Yes, I know we're
>> a patch behind, the upgrade is on the schedule) on
>> Ubuntu 14.
>>
>> I would expect it to be possible, and significantly more
>> efficient to do an index scan for that query.
>
> [...]
>
> We're probably missing a few tricks on queries of this type. If the
> index-traversal machinery had a mechanism to skip quickly to the next
> distinct value, that could be used here: walk up the btree until you
> find a page that contains keyspace not equal to the current key, then
> walk back down until you find the first leaf page that contains such a
> value. That would potentially let you step over large chunks of the
> index without actually examining all the leaf pages, which for a query
> like this seems like it could be a big win.

FWIW I messed around with prototyping this idea here:

https://www.postgresql.org/message-id/CADLWmXWALK8NPZqdnRQiPnrzAnic7NxYKynrkzO_vxYr8enWww@mail.gmail.com

I hope to return to that and some related ideas eventually as I learn
more about the relevant areas of the source code, if someone doesn't
beat me to it.

https://wiki.postgresql.org/wiki/Loose_indexscan shows a recursive CTE
that does the same thing at a higher level.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-07-07 22:07:27 Re: delta relations in AFTER triggers
Previous Message Robert Haas 2016-07-07 21:56:16 Re: [PATCH] add option to pg_dumpall to exclude tables from the dump