Re: Index Skip Scan

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Floris Van Nee <florisvannee(at)optiver(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, James Coleman <jtc331(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Bhushan Uparkar <bhushan(dot)uparkar(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: Re: Index Skip Scan
Date: 2019-11-02 20:32:08
Message-ID: CAH2-Wzmg-4ScB8kpDeQK44FitnNf=vzg97qHvNa0skO3S5Yj2w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Nov 2, 2019 at 11:56 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Wed, Sep 25, 2019 at 2:33 AM Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
> > v27-0001-Index-skip-scan.patch
>
> Some random thoughts on this:

And now some more:

* I'm confused about this code in _bt_skip():

> /*
> * Andvance backward but read forward. At this moment we are at the next
> * distinct key at the beginning of the series. In case if scan just
> * started, we can read forward without doing anything else. Otherwise find
> * previous distinct key and the beginning of it's series and read forward
> * from there. To do so, go back one step, perform binary search to find
> * the first item in the series and let _bt_readpage do everything else.
> */
> else if (ScanDirectionIsBackward(dir) && ScanDirectionIsForward(indexdir))
> {
> if (!scanstart)
> {
> _bt_drop_lock_and_maybe_pin(scan, &so->currPos);
> offnum = _bt_binsrch(scan->indexRelation, so->skipScanKey, buf);
>
> /* One step back to find a previous value */
> _bt_readpage(scan, dir, offnum);

Why is it okay to call _bt_drop_lock_and_maybe_pin() like this? It
looks like that will drop the lock (but not the pin) on the same
buffer that you binary search with _bt_binsrch() (since the local
variable "buf" is also the buf in "so->currPos").

* It also seems a bit odd that you assume that the scan is
"scan->xs_want_itup", but then check that condition many times. Why
bother?

* Similarly, why bother using _bt_drop_lock_and_maybe_pin() at all,
rather than just unlocking the buffer directly? We'll only drop the
pin for a scan that is "!scan->xs_want_itup", which is never the case
within _bt_skip().

I think that the macros and stuff that manage pins and buffer locks in
nbtsearch.c is kind of a disaster anyway [1]. Maybe there is some
value in trying to be consistent with existing nbtsearch.c code in
ways that aren't strictly necessary.

* Not sure why you need this code after throwing an error:

> else
> {
> elog(ERROR, "Could not read closest index tuples: %d", offnum);
> pfree(so->skipScanKey);
> so->skipScanKey = NULL;
> return false;
> }

[1] https://www.postgresql.org/message-id/flat/CAH2-Wz=m674-RKQdCG+jCD9QGzN1Kcg-FOdYw4-j+5_PfcHbpQ(at)mail(dot)gmail(dot)com
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Павел Ерёмин 2019-11-02 20:35:09 Re: 64 bit transaction id
Previous Message Justin Pryzby 2019-11-02 20:26:17 Re: bitmaps and correlation