Re: Index Skip Scan

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, florisvannee(at)optiver(dot)com, pg(at)bowt(dot)ie, jesper(dot)pedersen(at)redhat(dot)com, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, thomas(dot)munro(at)gmail(dot)com, jtc331(at)gmail(dot)com, rafia(dot)pghackers(at)gmail(dot)com, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, bhushan(dot)uparkar(at)gmail(dot)com, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: Re: Index Skip Scan
Date: 2020-03-06 14:49:17
Message-ID: 20200306144917.2esaoyttgimt3lgn@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 04, 2020 at 11:32:00AM +1300, David Rowley wrote:
>On Tue, 18 Feb 2020 at 05:24, Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
>> Here is something similar to what I had in mind.
>
>(changing to this email address for future emails)
>
>Hi,
>
>I've been looking over v32 of the patch and have a few comments
>regarding the planner changes.
>
>I think the changes in create_distinct_paths() need more work. The
>way I think this should work is that create_distinct_paths() gets to
>know exactly nothing about what path types support the elimination of
>duplicate values. The Path should carry the UniqueKeys so that can be
>determined. In create_distinct_paths() you should just be able to make
>use of those paths, which should already have been created when
>creating index paths for the rel due to PlannerInfo's query_uniquekeys
>having been set.
>

+1 to code this in a generic way, using query_uniquekeys (if possible)

>The reason it must be done this way is that when the RelOptInfo that
>we're performing the DISTINCT on is a joinrel, then we're not going to
>see any IndexPaths in the RelOptInfo's pathlist. We'll have some sort
>of Join path instead. I understand you're not yet supporting doing
>this optimisation when joins are involved, but it should be coded in
>such a way that it'll work when we do. (It's probably also a separate
>question as to whether we should have this only work when there are no
>joins. I don't think I personally object to it for stage 1, but
>perhaps someone else might think differently.)
>

I don't follow. Can you elaborate more?

AFAICS skip-scan is essentially a capability of an (index) AM. I don't
see how we could ever do that for joinrels? We can do that at the scan
level, below a join, but that's what this patch already supports, I
think. When you say "supporting this optimisation" with joins, do you
mean doing skip-scan for join inputs, or on top of the join?

>For storing these new paths with UniqueKeys, I'm not sure exactly if
>we can just add_path() such paths into the RelOptInfo's pathlist.
>What we don't want to do is accidentally make use of paths which
>eliminate duplicate values when we don't want that behaviour. If we
>did store these paths in RelOptInfo->pathlist then we'd need to go and
>modify a bunch of places to ignore such paths. set_cheapest() would
>have to do something special for them too, which makes me think
>pathlist is the incorrect place. Parallel query added
>partial_pathlist, so perhaps we need unique_pathlist to make this
>work.
>

Hmmm, good point. Do we actually produce incorrect plans with the
current patch, using skip-scan path when we should not?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeremy Finzel 2020-03-06 15:18:34 Re: PHJ file leak.
Previous Message David Steele 2020-03-06 14:48:50 Re: More tests to stress directly checksum_impl.h