Re: Index Skip Scan

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Floris Van Nee <florisvannee(at)optiver(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Bhushan Uparkar <bhushan(dot)uparkar(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: Re: Index Skip Scan
Date: 2020-02-10 19:30:56
Message-ID: 20200210193056.l4slz4bspqv44gqu@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Sat, Feb 08, 2020 at 01:31:02PM -0500, James Coleman wrote:
> On Sat, Feb 8, 2020 at 10:24 AM Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
> >
> > > On Sat, Feb 08, 2020 at 03:22:17PM +0100, Tomas Vondra wrote:
> > > So in this case the skip scan is ~15x slower than the usual plan (index
> > > only scan + unique). The reason why this happens is pretty simple - to
> > > estimate the number of groups we multiply the ndistinct estimates for
> > > the two columns (which both have n_distinct = 10000), but then we cap
> > > the estimate to 10% of the table. But when the columns are independent
> > > with high cardinalities that under-estimates the actual value, making
> > > the cost for skip scan much lower than it should be.
> >
> > The current implementation checks if we can find the next value on the
> > same page to do a shortcut instead of tree traversal and improve such
> > kind of situations, but I can easily imagine that it's still not enough
> > in some extreme situations.
>
> This is almost certainly rehashing already covered ground, but since I
> doubt it's been discussed recently, would you be able to summarize
> that choice (not to always get the next tuple by scanning from the top
> of the tree again) and the performance/complexity tradeoffs?

Yeah, this part of discussion happened already some time ago. The idea
[1] is to protect ourselves at least partially from incorrect ndistinct
estimations. Simply doing jumping over an index means that even if the
next key we're searching for is on the same page as previous, we still
end up doing a search from the root of the tree, which is of course less
efficient than just check right on the page before jumping further.

Performance tradeoff in this case is simple, we make regular use case
slightly slower, but can perform better in the worst case scenarios.
Complexity tradeoff was never discussed, but I guess everyone assumed
it's relatively straightforward to check the current page and return if
something was found before jumping.

[1]: https://www.postgresql.org/message-id/CA%2BTgmoY7QTHhzLWZupNSyyqFRBfMgYocg3R-6g%3DDRgT4-KBGqg%40mail.gmail.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-02-10 19:56:59 Re: POC: GUC option for skipping shared buffers in core dumps
Previous Message Alexander Korotkov 2020-02-10 19:07:13 POC: GUC option for skipping shared buffers in core dumps