Re: How to retain lesser paths at add_path()?

From: Kohei KaiGai <kaigai(at)heterodb(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Richard Guo <riguo(at)pivotal(dot)io>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: How to retain lesser paths at add_path()?
Date: 2019-08-12 04:28:03
Message-ID: CAOP8fzbM+6wSngKTdTY7xpf=DAAmK=CNeACJVy2VpCtGwx8=WQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

For implementation of the concept, this patch puts a hook on add_path
/ add_partial_path
to override the path removal decision by extensions, according to its
own viewpoint.
We don't add any metrics other than path's cost and path keys, so
set_cheapest() still picks
up paths based on its cost for each depth.
As we are currently doing, extensions (FDW / CSP) are responsible to
construct and add
paths with reasonable cost values, then PostgreSQL optimizer chooses
the "best" path
according to the (self-reported) cost. On the other hands, an
expensive path at a particular
level is not always expensive from upper viewpoint, if combination of
path-A and path-B
has special optimization, like a reduction of DMA transfer between
host and device, or omit
of network transfer between local and remote.
In these cases, extension can get a control to override a decision
whether old_path that
is dominated by new_path shall be removed, or not. If old_path
survived, extension can
re-use the path at the upper level to construct a special path.

Best regards,

2019年8月1日(木) 21:14 Kohei KaiGai <kaigai(at)heterodb(dot)com>:
>
> 2019年8月1日(木) 19:24 Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>:
> >
> > On Thu, Aug 01, 2019 at 06:28:08PM +0900, Kohei KaiGai wrote:
> > >2019年8月1日(木) 16:19 Richard Guo <riguo(at)pivotal(dot)io>:
> > >>
> > >> On Thu, Aug 1, 2019 at 2:12 PM Kohei KaiGai <kaigai(at)heterodb(dot)com> wrote:
> > >>>
> > >>> 2019年8月1日(木) 1:41 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> > >>> >
> > >>> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > >>> > > Yeah, but I have to admit that this whole design makes me kinda
> > >>> > > uncomfortable. Every time somebody comes up with a new figure of
> > >>> > > merit, it increases not only the number of paths retained but also the
> > >>> > > cost of comparing two paths to possibly reject one of them. A few
> > >>> > > years ago, you came up with the (good) idea of rejecting some join
> > >>> > > paths before actually creating the paths, and I wonder if we ought to
> > >>> > > try to go further with that somehow. Or maybe, as Peter Geoghegan, has
> > >>> > > been saying, we ought to think about planning top-down with
> > >>> > > memoization instead of bottom up (yeah, I know that's a huge change).
> > >>> > > It just feels like the whole idea of a list of paths ordered by cost
> > >>> > > breaks down when there are so many ways that a not-cheapest path can
> > >>> > > still be worth keeping. Not sure exactly what would be better, though.
> > >>> >
> > >>> > Yeah, I agree that add_path is starting to feel creaky. I don't
> > >>> > know what to do instead though. Changing to a top-down design
> > >>> > sounds like it would solve some problems while introducing others
> > >>> > (not to mention the amount of work and breakage involved).
> > >>> >
> > >>> Hmm... It looks the problem we ought to revise about path construction
> > >>> is much larger than my expectation, and uncertain for me how much works
> > >>> are needed.
> > >>>
> > >>> Although it might be a workaround until fundamental reconstruction,
> > >>> how about to have a margin of estimated cost to reject paths?
> > >>> Current add_path() immediately rejects lesser paths if its cost is
> > >>> even a little more expensive than the compared one. One the other hands,
> > >>
> > >>
> > >> Hmm.. I don't think so. Currently add_path() uses fuzzy comparisons on
> > >> costs of two paths, although the fuzz factor (1%) is hard coded and not
> > >> user-controllable.
> > >>
> > >Ah, sorry, I oversight this logic...
> > >
> >
> > FWIW I doubt adding larger "fuzz factor" is unlikely to be a reliable
> > solution, because how would you know what value is the right one? Why ould
> > 10% be the right threshold, for example? In my experience these these
> > hard-coded coefficients imply behavior that's difficult to predict and
> > explain to users.
> >
> Ah... That's exactly hard task to explain to users.
>
> > >>> I understand it is not an essential re-design of path-construction logic, and
> > >>> may have limitation. However, amount of works are reasonable and no side-
> > >>> effect. (current behavior = 0% threshold).
> > >>> How about your opinions?
> > >>>
> > >>
> > >> How's about Tom's suggestion on adding another dimension in add_path()
> > >> to be considered, just like how it considers paths of better sort order
> > >> or parallel-safe?
> > >>
> > >Robert also mentioned it makes comparison operation more complicated.
> > >If we try to have another dimension here, a callback function in Path node
> > >may be able to tell the core optimizer whether "dominated path" shall be
> > >dropped or not, without further complexity. It is just an idea.
> > >
> >
> > I think adding a hook to add_path() allowing to override the decidion
> > should be OK. The chance of getting that committed in the near future
> > seems much higher than for a patch that completely reworks add_path().
> >
> > There's one caveat, though - AFAICS various places in the planner use
> > things like cheapest_total_path, cheapest_startup_path and even
> > get_cheapest_path_for_pathkeys() which kinda assumes add_path() only
> > considers startup/total cost. It might happen that even after keeping
> > additional paths, the planner still won't use them :-(
> >
> Even if existing code looks at only cheapest_xxx_path, I don't think it is
> a problematic behavior because these paths are exactly cheapest at a level,
> but they may use more expensive paths in the deeper level.
> If a hook can prevent dropping a path, not cheapest, in a particular level,
> this path shall not appear on the cheapest_xxx_path, however, upper level
> path construction logic can pick up these paths as a candidate of input.
> If it has special discount factor here, the startup/total cost of the
> upper level
> path may have cheaper cost in spite of expensive input cost.
>
> In this scenario, this hook gives a decision whether dominated path-node
> shall be dropped or not. So, core optimizer still compares path-node using
> estimated cost value.
>
> In my scenario, even if GpuJoin is expensive at top level of JOIN/SCAN path
> construction, GpuPreAgg + GpuJoin may be cheaper than others because of
> data exchange on GPU device memory.
> As long as GpuJoin is not removed from the pathlist, extension can build its
> custom-path with cheaper cost.
>
> Best regards,
> --
> HeteroDB, Inc / The PG-Strom Project
> KaiGai Kohei <kaigai(at)heterodb(dot)com>

--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai(at)heterodb(dot)com>

Attachment Content-Type Size
pgsql13-path_removal_decision_hook.v1.patch application/octet-stream 2.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kohei KaiGai 2019-08-12 06:03:14 Asymmetric partition-wise JOIN
Previous Message Thomas Munro 2019-08-12 01:04:38 Re: SegFault on 9.6.14