Re: Index Skip Scan

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, a(dot)kuzmenkov(at)postgrespro(dot)ru
Cc: pg(at)bowt(dot)ie, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Bhushan Uparkar <bhushan(dot)uparkar(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, jtc331(at)gmail(dot)com
Subject: Re: Index Skip Scan
Date: 2018-11-17 23:27:07
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

> On Fri, 16 Nov 2018 at 16:06, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com> wrote:
> On 11/15/18 6:41 AM, Alexander Kuzmenkov wrote:
> >>> But having this logic inside _bt_next means that it will make a
> >>> non-skip index
> >>> only scan a bit slower, am I right?
> >>
> >> Correct.
> >
> > Well, it depends on how the skip scan is implemented. We don't have to
> > make normal scans slower, because skip scan is just a separate thing.
> >
> > My main point was that current implementation is good as a proof of
> > concept, but it is inefficient for some data and needs some unreliable
> > planner logic to work around this inefficiency. And now we also have
> > execution-time fallback because planning-time fallback isn't good
> > enough. This looks overly complicated. Let's try to design an algorithm
> > that works well regardless of the particular data and doesn't need these
> > heuristics. It should be possible to do so for btree.
> >
> > Of course, I understand the reluctance to implement an entire new type
> > of btree traversal. Anastasia Lubennikova suggested a tweak for the
> > current method that may improve the performance for small groups of
> > equal values. When we search for the next unique key, first check if it
> > is contained on the current btree page using its 'high key'. If it is,
> > find it on the current page. If not, search from the root of the tree
> > like we do now. This should improve the performance for small equal
> > groups, because there are going to be several such groups on the page.
> > And this is exactly where we have the regression compared to unique +
> > index scan.
> Robert suggested something similar in [1]. I'll try and look at that
> when I'm back from my holiday.

Yeah, probably you're right. Unfortunately, I've misunderstood the previous
Robert's message in this thread with the similar approach. Jesper, I hope you
don't mind if I'll post an updated patch? _bt_skip is changed there in the
suggested way, so that it checks the current page before searching from the
root of a tree, and I've removed the fallback logic. After some
initial tests I see
that with this version skip scan over a table with 10^7 rows and 10^6
distinct values is slightly slower than a regular scan, but not that much.

> > By the way, what is the data for which we intend this feature to work?
> > Obviously a non-unique btree index, but how wide are the tuples, and how
> > big the equal groups? It would be good to have some real-world examples.
> Although my primary use-case is int I agree that we should test the
> different data types, and tuple widths.

My personal motivation here is exactly that we face use-cases for skip scan
from time to time. Usually it's quite few distinct values (up to a dozen or
so, which means that equal groups are quite big), but with the variety of types
and widths.

> On Thu, 15 Nov 2018 at 15:28, James Coleman <jtc331(at)gmail(dot)com> wrote:
> > Is skip scan only possible for index-only scan?
> I haven't see much discussion of this question yet. Is there a
> particular reason to lock ourselves into thinking about this only in
> an index only scan?

I guess, the only reason is to limit the scope of the patch.

Attachment Content-Type Size
0001-Index-skip-scan-v4.patch application/octet-stream 35.4 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2018-11-17 23:37:15 Re: pg11.1 jit segv
Previous Message Tomas Vondra 2018-11-17 23:11:54 Re: valgrind issues on Fedora 28