Re: Index Skip Scan

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: 9erthalion6(at)gmail(dot)com
Cc: jesper(dot)pedersen(at)redhat(dot)com, a(dot)kuzmenkov(at)postgrespro(dot)ru, pg(at)bowt(dot)ie, thomas(dot)munro(at)enterprisedb(dot)com, bhushan(dot)uparkar(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org, a(dot)korotkov(at)postgrespro(dot)ru, jtc331(at)gmail(dot)com
Subject: Re: Index Skip Scan
Date: 2019-01-31 06:31:53
Message-ID: 20190131.153153.83078698.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

At Wed, 30 Jan 2019 18:19:05 +0100, Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote in <CA+q6zcVP18wYiO=aa+fz3GuncuTF52q1sufB7ise37TJPSDK1w(at)mail(dot)gmail(dot)com>
> A bit of adjustment after nodes/relation -> nodes/pathnodes.

I had a look on this.

The name "index skip scan" is a different feature from the
feature with the name on other prodcuts, which means "index scan
with postfix key (of mainly of multi column key) that scans
ignoring the prefixing part" As Thomas suggested I'd suggest that
we call it "index hop scan". (I can accept Hopscotch, either:p)

Also as mentioned upthread by Peter Geoghegan, this could easly
give worse plan by underestimation. So I also suggest that this
has dynamic fallback function. In such perspective it is not
suitable for AM API level feature.

If all leaf pages are on the buffer and the average hopping
distance is less than (expectedly) 4 pages (the average height of
the tree), the skip scan will lose. If almost all leaf pages are
staying on disk, we could win only by 2-pages step (skipping over
one page).

=====
As I'm writing the above, I came to think that it's better
implement this as an pure executor optimization.

Specifically, let _bt_steppage count the ratio of skipped pages
so far then if the ratio exceeds some threshold (maybe around
3/4) it gets into hopscotching mode, where it uses index scan to
find the next page (rather than traversing). As mentioned above,
I think using skip scan to go beyond the next page is a good
bet. If the success ration of skip scans goes below some
threshold (I'm not sure for now), we should fall back to
traversing.

Any opinions?

====

Some comments on the patch below.

+ skip scan approach, which is based on the idea of
+ <ulink url="https://wiki.postgresql.org/wiki/Free_Space_Map_Problems">
+ Loose index scan</ulink>. Rather than scanning all equal values of a key,
+ as soon as a new value is found, it will search for a larger value on the

I'm not sure it is a good thing to put a pointer to rather
unstable source in the documentation.

This adds a new AM method but it seems avaiable only for ordered
indexes, specifically btree. And it seems that the feature can be
implemented in btgettuple since btskip apparently does the same
thing. (I agree to Robert in the point in [1]).

[1] https://www.postgresql.org/message-id/CA%2BTgmobb3uN0xDqTRu7f7WdjGRAXpSFxeAQnvNr%3DOK5_kC_SSg%40mail.gmail.com

Related to the above, it seems better that the path generation of
skip scan is a part of index scan. Whether skip scan or not is a
matter of index scan itself.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-01-31 06:47:59 Re: Unused parameters & co in code
Previous Message Arseny Sher 2019-01-31 06:21:59 Too rigorous assert in reorderbuffer.c