Re: why not parallel seq scan for slow functions

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: why not parallel seq scan for slow functions
Date: 2017-08-02 17:42:40
Message-ID: CAMkU=1ymvFbTCYFgzj45_EMzBg=ddQ_m2j3cObzU=vywqttf-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 12, 2017 at 7:08 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:

> On Wed, Jul 12, 2017 at 11:20 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> > On Tue, Jul 11, 2017 at 10:25 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > wrote:
> >>
> >> On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
> wrote:
> >> > On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
> >> > wrote:
> >> >>
> >> >> So because of this high projection cost the seqpath and parallel path
> >> >> both have fuzzily same cost but seqpath is winning because it's
> >> >> parallel safe.
> >> >
> >> >
> >> > I think you are correct. However, unless parallel_tuple_cost is set
> >> > very
> >> > low, apply_projection_to_path never gets called with the Gather path
> as
> >> > an
> >> > argument. It gets ruled out at some earlier stage, presumably because
> >> > it
> >> > assumes the projection step cannot make it win if it is already behind
> >> > by
> >> > enough.
> >> >
> >>
> >> I think that is genuine because tuple communication cost is very high.
> >
> >
> > Sorry, I don't know which you think is genuine, the early pruning or my
> > complaint about the early pruning.
> >
>
> Early pruning. See, currently, we don't have a way to maintain both
> parallel and non-parallel paths till later stage and then decide which
> one is better. If we want to maintain both parallel and non-parallel
> paths, it can increase planning cost substantially in the case of
> joins. Now, surely it can have benefit in many cases, so it is a
> worthwhile direction to pursue.
>

If I understand it correctly, we have a way, it just can lead to
exponential explosion problem, so we are afraid to use it, correct? If I
just lobotomize the path domination code (make pathnode.c line 466 always
test false)

if (JJ_all_paths==0 && costcmp != COSTS_DIFFERENT)

Then it keeps the parallel plan and later chooses to use it (after applying
your other patch in this thread) as the overall best plan. It even doesn't
slow down "make installcheck-parallel" by very much, which I guess just
means the regression tests don't have a lot of complex joins.

But what is an acceptable solution? Is there a heuristic for when
retaining a parallel path could be helpful, the same way there is for
fast-start paths? It seems like the best thing would be to include the
evaluation costs in the first place at this step.

Why is the path-cost domination code run before the cost of the function
evaluation is included? Is that because the information needed to compute
it is not available at that point, or because it would be too slow to
include it at that point? Or just because no one thought it important to do?

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-08-02 17:44:54 Re: Macros bundling RELKIND_* conditions
Previous Message Yura Sokolov 2017-08-02 17:35:58 Re: Walsender timeouts and large transactions