Re: DISTINCT with btree skip scan

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: DISTINCT with btree skip scan
Date: 2016-11-25 01:33:37
Message-ID: CA+Tgmobb3uN0xDqTRu7f7WdjGRAXpSFxeAQnvNr=OK5_kC_SSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 23, 2016 at 4:19 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Wed, Oct 12, 2016 at 4:19 PM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> Here it is, now split into two parts: one patch
>> to add an amskip operation, and another to consider using it to
>> implement DISTINCT when it was already has an index only scan path on
>> an index that supports skipping.
>
> Those patches add an amskip operation and then teach the planner and
> executor how to do this, given a table foo (a, b) with an index on (a)
> or (a, b):
>
> SELECT DISTINCT foo FROM a
>
> Index Only Scan for distinct values of (a) using foo_pkey on foo

Cool!

> In just the right circumstances that is vastly faster than scanning
> every tuple and uniquifying. I was thinking that the next step in
> this experiment might be to introduce a kind of plan that can handle
> queries where the index's leading column is not in the WHERE clause,
> building on the above plan, and behaving conceptually a little bit
> like a Nested Loop, like so:
>
> SELECT * FROM foo WHERE b = 42
>
> Index Skip Scan
> -> Index Only Scan for distinct values of (a) using foo_pkey on foo
> -> Index Only Scan using foo_pkey on foo
> Index Cond: ((a = outer.a) AND (b = 42))

Blech, that seems really ugly and probably inefficient.

> I suppose you could instead have a single node that knows how to
> perform the whole scan itself so that you don't finish up searching
> for each distinct value in both the inner and outer subplan, but it
> occurred to me that building the plan out of Lego bricks like this
> might generalise better to more complicated queries. I suppose it
> might also parallelise nicely with a Parallel Index Only Scan as the
> outer plan.

I think putting all the smarts in a single node is likely to be
better, faster, and cleaner.

This patch has gotten favorable comments from several people, but
somebody really needs to REVIEW it.

Oh, well, since I'm here...

+ if (ScanDirectionIsForward(dir))
+ {
+ so->currPos.moreLeft = false;
+ so->currPos.moreRight = true;
+ }
+ else
+ {
+ so->currPos.moreLeft = true;
+ so->currPos.moreRight = false;
+ }

The lack of comments makes it hard for me to understand what the
motivation for this is, but I bet it's wrong. Suppose there's a
cursor involved here and the user tries to back up. Instead of having
a separate amskip operation, maybe there should be a flag attached to
a scan indicating whether it should return only distinct results.
Otherwise, you're allowing for the possibility that the same scan
might sometimes skip and other times not skip, but then it seems hard
for the scan to survive cursor operations. Anyway, why would that be
useful?

+ if (ScanDirectionIsForward(dir))
+ offnum = OffsetNumberPrev(offnum);
+ if (!_bt_readpage(scan, dir, offnum))
+ {
+ if (!_bt_steppage(scan, dir))
+ {
+ return false;
+ }
+ }

As the TODO:TM comment at the top of the function sort of implies
without directly coming out and saying so, this is ugly and suggests
that you've picked the wrong API. If the idea I proposed above isn't
appealing, I still think we need to get rid of this by some other
means.

+ /*
+ * TODO:TM For now, we use _bt_search to search from the root; in theory
+ * we should be able to do a local traversal ie from the current page, but
+ * I don't know if it would actually be better in general.
+ */

I think that this depends a lot on the number of duplicates. If we
end up merely fast-forwarding within the same page to find the next
unique value, re-traversing from the root sucks. However, if the next
distinct key *isn't* on the same page, then traversing from the root
is a good bet. A typical btree is only a few levels deep (like 3) so
once you don't find what you're looking for on the same page it's
probably fairly sound to re-traverse from the root. Of course you
lose at least a little bit in the case where you would have found the
next distinct key within a page or two but in other cases you win big.
I wonder if that would be a suitable heuristic: first check the
current page, if all keys there are equal then retraverse from the
root.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2016-11-25 02:05:24 Re: macaddr 64 bit (EUI-64) datatype support
Previous Message Kyotaro HORIGUCHI 2016-11-25 01:24:47 Re: IF (NOT) EXISTS in psql-completion