Re: Index Skip Scan

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Floris Van Nee <florisvannee(at)optiver(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "jesper(dot)pedersen(at)redhat(dot)com" <jesper(dot)pedersen(at)redhat(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, James Coleman <jtc331(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Bhushan Uparkar <bhushan(dot)uparkar(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: Re: Index Skip Scan
Date: 2019-09-05 19:20:06
Message-ID: CA+q6zcXO-XzM2Be7ZX8SZf7Xr-Hw5gMRXrtiMkUoivpF8J-9DA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Mon, Sep 2, 2019 at 3:28 PM Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
>
> > On Wed, Aug 28, 2019 at 9:32 PM Floris Van Nee <florisvannee(at)optiver(dot)com> wrote:
> >
> > I'm afraid I did manage to find another incorrect query result though
>
> Yes, it's an example of what I was mentioning before, that the current modified
> implementation of `_bt_readpage` wouldn't work well in case of going between
> pages. So far it seems that the only problem we can have is when previous and
> next items located on a different pages. I've checked how this issue can be
> avoided, I hope I will post a new version relatively soon.

Here is the version in which stepping between the pages works better. It seems
sufficient to fix the case you've mentioned before, but for that we need to
propagate keepPrev logic through `_bt_steppage` & `_bt_readnextpage`, and I
can't say I like this solution. I have an idea that maybe it would be simpler
to teach the code after index_skip to not do `_bt_next` right after one skip
happened before. It should immediately elliminate several hacks from index skip
itself, so I'll try to pursue this idea.

> On Wed, Sep 4, 2019 at 10:45 PM Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:

Thank you for checking it out!

> Surely it isn't right to add members prefixed with "ioss_" to
> struct IndexScanState.

Yeah, sorry. I've incorporated IndexScan support originally only to show that
it's possible (with some limitations), but after that forgot to clean up. Now
those fields are renamed.

> I'm surprised about this "FirstTupleEmitted" business. Wouldn't it make
> more sense to implement index_skip() to return the first tuple if the
> scan is just starting? (I know little about executor, apologies if this
> is a stupid question.)

I'm not entirely sure, which exactly part do you mean? Now the first tuple is
returned by `_bt_first`, how would it help if index_skip will return it?

> It would be good to get more knowledgeable people to review this patch.
> It's clearly something we want, yet it's been there for a very long
> time.

Sure, that would be nice.

Attachment Content-Type Size
v25-0001-Index-skip-scan.patch application/octet-stream 85.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-09-05 19:27:28 Re: tableam vs. TOAST
Previous Message Robert Haas 2019-09-05 19:17:51 Re: [HACKERS] CLUSTER command progress monitor