Re: Index Skip Scan

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: 9erthalion6(at)gmail(dot)com
Cc: jesper(dot)pedersen(at)redhat(dot)com, david(dot)rowley(at)2ndquadrant(dot)com, florisvannee(at)optiver(dot)com, thomas(dot)munro(at)gmail(dot)com, jtc331(at)gmail(dot)com, rafia(dot)pghackers(at)gmail(dot)com, jeff(dot)janes(at)gmail(dot)com, pg(at)bowt(dot)ie, tomas(dot)vondra(at)2ndquadrant(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, bhushan(dot)uparkar(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org, a(dot)korotkov(at)postgrespro(dot)ru
Subject: Re: Index Skip Scan
Date: 2019-07-25 11:17:37
Message-ID: 20190725.201737.192223037.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

At Wed, 24 Jul 2019 22:49:32 +0200, Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote in <CA+q6zcXgwDMiowOGbr7gimTY3NV-LbcwP=rbma_L56pc+9p1Xw(at)mail(dot)gmail(dot)com>
> > On Mon, Jul 22, 2019 at 7:10 PM Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com> wrote:
> >
> > On 7/22/19 1:44 AM, David Rowley wrote:
> > > Here are the comments I noted down during the review:
> > >
> > > cost_index:
> > >
> > > I know you've not finished here, but I think it'll need to adjust
> > > tuples_fetched somehow to account for estimate_num_groups() on the
> > > Path's unique keys. Any Eclass with an ec_has_const = true does not
> > > need to be part of the estimate there as there can only be at most one
> > > value for these.
> > >
> > > For example, in a query such as:
> > >
> > > SELECT x,y FROM t WHERE x = 1 GROUP BY x,y;
> > >
> > > you only need to perform estimate_num_groups() on "y".
> > >
> > > I'm really not quite sure on what exactly will be required from
> > > amcostestimate() here. The cost of the skip scan is not the same as
> > > the normal scan. So other that API needs adjusted to allow the caller
> > > to mention that we want skip scans estimated, or there needs to be
> > > another callback.
> > >
> >
> > I think this part will become more clear once the index skip scan patch
> > is rebased, as we got the uniquekeys field on the Path, and the
> > indexskipprefixy info on the IndexPath node.
>
> Here is what I came up with to address the problems, mentioned above in this
> thread. It passes tests, but I haven't tested it yet more thoughtful (e.g. it
> occurred to me, that `_bt_read_closest` probably wouldn't work, if a next key,
> that passes an index condition is few pages away - I'll try to tackle it soon).
> Just another small step forward, but I hope it's enough to rebase on top of it
> planner changes.
>
> Also I've added few tags, mostly to mention reviewers contribution.

I have some comments.

+ * The order of columns in the index should be the same, as for
+ * unique distincs pathkeys, otherwise we cannot use _bt_search
+ * in the skip implementation - this can lead to a missing
+ * records.

It seems that it is enough that distinct pathkeys is contained in
index pathkeys. If it's right, that is almost checked in existing
code:

> if (pathkeys_contained_in(needed_pathkeys, path->pathkeys))

It is perfect when needed_pathkeys is distinct_pathkeys. So
additional check is required if and only if it is not the case.

> if (enable_indexskipscan &&
> IsA(path, IndexPath) &&
> ((IndexPath *) path)->indexinfo->amcanskip &&
> (path->pathtype == T_IndexOnlyScan ||
> path->pathtype == T_IndexScan) &&
> (needed_pathkeys == root->distinct_pathkeys ||
> pathkeys_contained_in(root->distinct_pathkeys,
> path->pathkeys)))

path->pathtype is always one of T_IndexOnlyScan or T_IndexScan so
the check against them isn't needed. If you have concern on that,
we can add that as Assert().

I feel uncomfortable to look into indexinfo there. Couldnd't we
use indexskipprefix == -1 to signal !amcanskip from
create_index_path?

+ /*
+ * XXX: In case of index scan quals evaluation happens after
+ * ExecScanFetch, which means skip results could be fitered out
+ */

Why can't we use skipscan path if having filter condition? If
something bad happens, the reason must be written here instead of
what we do.

+ if (path->pathtype == T_IndexScan &&
+ parse->jointree != NULL &&
+ parse->jointree->quals != NULL &&
+ ((List *)parse->jointree->quals)->length != 0)

It's better to use list_length instead of peeping inside. It
handles the NULL case as well. (The structure has recently
changed but .length is not, though.)

+ * If advancing direction is different from index direction, we must
+ * skip right away, but _bt_skip requires a starting point.

It doesn't seem needed to me. Could you elaborate on the reason
for that?

+ * If advancing direction is different from index direction, we must
+ * skip right away, but _bt_skip requires a starting point.
+ */
+ if (direction * indexonlyscan->indexorderdir < 0 &&
+ !node->ioss_FirstTupleEmitted)

I'm confused by this. "direction" there is the physical scan
direction (fwd/bwd) of index scan, which is already compensated
by indexorderdir. Thus the condition means we do that when
logical ordering (ASC/DESC) is DESC. (Though I'm not sure what
"index direction" exactly means...)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rafia Sabih 2019-07-25 11:21:52 Re: Initdb failure
Previous Message Julien Rouhaud 2019-07-25 11:00:34 Re: Add parallelism and glibc dependent only options to reindexdb