Re: Index Skip Scan (new UniqueKeys)

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: Floris Van Nee <florisvannee(at)Optiver(dot)com>
Cc: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: Index Skip Scan (new UniqueKeys)
Date: 2020-07-14 16:18:52
Message-ID: 20200714161852.s27ikts2dppvbm3h@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Sun, Jul 12, 2020 at 12:48:47PM +0000, Floris Van Nee wrote:
> >
> > Good point, thanks for looking at this. With the latest planner version there
> > are indeed more possibilities to use skipping. It never occured to me that
> > some of those paths will still rely on index scan returning full data set. I'll look
> > in details and add verification to prevent putting something like this on top of
> > skip scan in the next version.
>
> I believe the required changes are something like in attached patch. There were a few things I've changed:
> - build_uniquekeys was constructing the list incorrectly. For a DISTINCT a,b, it would create two unique keys, one with a and one with b. However, it should be one unique key with (a,b).

Yes, I've also noticed that while preparing fix for index scan not
covered by index and included it.

> - the uniquekeys that is built, still contains some redundant keys, that are normally eliminated from the path keys lists.

I guess you're talking about:

+ if (EC_MUST_BE_REDUNDANT(ec))
+ continue;

Can you add some test cases to your changes to show the effect of it? It
seem to me redundant keys are already eliminated at this point by either
make_pathkeys_for_uniquekeys or even earlier for distinct on, but could
be I've missed something.

Along the lines I'm also curious about this part:

- ListCell *k;
- List *exprs = NIL;
-
- foreach(k, ec->ec_members)
- {
- EquivalenceMember *mem = (EquivalenceMember *) lfirst(k);
- exprs = lappend(exprs, mem->em_expr);
- }
-
- result = lappend(result, makeUniqueKey(exprs, false, false));
+ EquivalenceMember *mem = (EquivalenceMember*) lfirst(list_head(ec->ec_members));

I'm curious about this myself, maybe someone can clarify. It looks like
generaly speaking there could be more than one member (if not
ec_has_volatile), which "representing knowledge that multiple items are
effectively equal". Is this information is not interesting enough to
preserve it in unique keys?

> - the distinct_pathkeys may be NULL, even though there's a possibility for skipping. But it wouldn't create the uniquekeys in this case. This makes the planner not choose skip scans even though it could. For example in queries that do SELECT DISTINCT ON (a) * FROM t1 WHERE a=1 ORDER BY a,b; Since a is constant, it's eliminated from regular pathkeys.

What would be the point of skipping if it's a constant?

> - to combat the issues mentioned earlier, there's now a check in build_index_paths that checks if the query_pathkeys matches the useful_pathkeys. Note that we have to use the path keys here rather than any of the unique keys. The unique keys are only Expr nodes - they do not contain the necessary information about ordering. Due to elimination of some constant path keys, we have to search the attributes of the index to find the correct prefix to use in skipping.

IIUC here you mean this function, right?

+ prefix = find_index_prefix_for_pathkey(root,
+ index,
+ BackwardScanDirection,
+ llast_node(PathKey,
+ root->distinct_pathkeys));

Doesn't it duplicate the job already done in build_index_pathkeys by
building those pathkeys again? If yes, probably it's possible to reuse
useful_pathkeys. Not sure about unordered indexes, but looks like
query_pathkeys should also match in this case.

Will also look at the follow up questions in the next email.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2020-07-14 16:58:22 Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Previous Message Robert Haas 2020-07-14 16:08:46 Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit