Re: How to retain lesser paths at add_path()?

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Kohei KaiGai <kaigai(at)heterodb(dot)com>
Cc: Richard Guo <riguo(at)pivotal(dot)io>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: How to retain lesser paths at add_path()?
Date: 2019-08-01 10:24:54
Message-ID: 20190801102454.qwvfkloue4w67pyk@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 01, 2019 at 06:28:08PM +0900, Kohei KaiGai wrote:
>2019年8月1日(木) 16:19 Richard Guo <riguo(at)pivotal(dot)io>:
>>
>> On Thu, Aug 1, 2019 at 2:12 PM Kohei KaiGai <kaigai(at)heterodb(dot)com> wrote:
>>>
>>> 2019年8月1日(木) 1:41 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>>> >
>>> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> > > Yeah, but I have to admit that this whole design makes me kinda
>>> > > uncomfortable. Every time somebody comes up with a new figure of
>>> > > merit, it increases not only the number of paths retained but also the
>>> > > cost of comparing two paths to possibly reject one of them. A few
>>> > > years ago, you came up with the (good) idea of rejecting some join
>>> > > paths before actually creating the paths, and I wonder if we ought to
>>> > > try to go further with that somehow. Or maybe, as Peter Geoghegan, has
>>> > > been saying, we ought to think about planning top-down with
>>> > > memoization instead of bottom up (yeah, I know that's a huge change).
>>> > > It just feels like the whole idea of a list of paths ordered by cost
>>> > > breaks down when there are so many ways that a not-cheapest path can
>>> > > still be worth keeping. Not sure exactly what would be better, though.
>>> >
>>> > Yeah, I agree that add_path is starting to feel creaky. I don't
>>> > know what to do instead though. Changing to a top-down design
>>> > sounds like it would solve some problems while introducing others
>>> > (not to mention the amount of work and breakage involved).
>>> >
>>> Hmm... It looks the problem we ought to revise about path construction
>>> is much larger than my expectation, and uncertain for me how much works
>>> are needed.
>>>
>>> Although it might be a workaround until fundamental reconstruction,
>>> how about to have a margin of estimated cost to reject paths?
>>> Current add_path() immediately rejects lesser paths if its cost is
>>> even a little more expensive than the compared one. One the other hands,
>>
>>
>> Hmm.. I don't think so. Currently add_path() uses fuzzy comparisons on
>> costs of two paths, although the fuzz factor (1%) is hard coded and not
>> user-controllable.
>>
>Ah, sorry, I oversight this logic...
>

FWIW I doubt adding larger "fuzz factor" is unlikely to be a reliable
solution, because how would you know what value is the right one? Why ould
10% be the right threshold, for example? In my experience these these
hard-coded coefficients imply behavior that's difficult to predict and
explain to users.

>>> I understand it is not an essential re-design of path-construction logic, and
>>> may have limitation. However, amount of works are reasonable and no side-
>>> effect. (current behavior = 0% threshold).
>>> How about your opinions?
>>>
>>
>> How's about Tom's suggestion on adding another dimension in add_path()
>> to be considered, just like how it considers paths of better sort order
>> or parallel-safe?
>>
>Robert also mentioned it makes comparison operation more complicated.
>If we try to have another dimension here, a callback function in Path node
>may be able to tell the core optimizer whether "dominated path" shall be
>dropped or not, without further complexity. It is just an idea.
>

I think adding a hook to add_path() allowing to override the decidion
should be OK. The chance of getting that committed in the near future
seems much higher than for a patch that completely reworks add_path().

There's one caveat, though - AFAICS various places in the planner use
things like cheapest_total_path, cheapest_startup_path and even
get_cheapest_path_for_pathkeys() which kinda assumes add_path() only
considers startup/total cost. It might happen that even after keeping
additional paths, the planner still won't use them :-(

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-08-01 10:30:01 Re: Support for jsonpath .datetime() method
Previous Message Thomas Munro 2019-08-01 10:22:23 Re: SQL:2011 PERIODS vs Postgres Ranges?