Re: [HACKERS] why not parallel seq scan for slow functions

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Marina Polyakova <m(dot)polyakova(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: [HACKERS] why not parallel seq scan for slow functions
Date: 2018-03-16 10:06:03
Message-ID: CAA4eK1LHFRTn1fbjDNZ3Z15kpK-mSL85Hta8TeyHXnoEj9WK8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 14, 2018 at 12:01 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> 0003 introduces a new upper relation to represent the result of
> applying the scan/join target to the topmost scan/join relation. I'll
> explain further down why this seems to be needed. Since each path
> must belong to only one relation, this involves using
> create_projection_path() for the non-partial pathlist as we already do
> for the partial pathlist, rather than apply_projection_to_path().
> This is probably marginally more expensive but I'm hoping it doesn't
> matter. (However, I have not tested.)
>

I think in the patch series this is the questionable patch wherein it
will always add an additional projection path (whether or not it is
required) to all Paths (partial and non-partial) for the scanjoin rel
and then later remove it (if not required) in create_projection_plan.
As you are saying, I also think it might not matter much in the grand
scheme of things and if required we can test it as well, however, I
think it is better if some other people can also share their opinion
on this matter.

Tom, do you have anything to say?

Each non-partial path in the
> topmost scan/join rel produces a corresponding path in the new upper
> rel. The same is done for partial paths if the scan/join target is
> parallel-safe; otherwise we can't.
>
> 0004 causes the main part of the planner to skip calling
> generate_gather_paths() from the topmost scan/join rel. This logic is
> mostly stolen from your patch. If the scan/join target is NOT
> parallel-safe, we perform generate_gather_paths() on the topmost
> scan/join rel. If the scan/join target IS parallel-safe, we do it on
> the post-projection rel introduced by 0003 instead. This is the patch
> that actually fixes Jeff's original complaint.
>

Looks good, I feel you can include the test from my patch as well.

> 0005 goes a bit further and removes a bunch of logic from
> create_ordered_paths(). The removed logic tries to satisfy the
> query's ordering requirement via cheapest_partial_path + Sort + Gather
> Merge. Instead, it adds an additional path to the new upper rel added
> in 0003. This again avoids a call to apply_projection_to_path() which
> could cause projection to be pushed down after costing has already
> been fixed. Therefore, it gains the same advantages as 0004 for
> queries that would this sort of plan.
>

After this patch, part of the sorts related work will be done in
create_ordered_paths and the other in grouping_planner, which looks
bit odd, otherwise, I don't see any problem with it.

> Currently, this loses the
> ability to set limit_tuples for the Sort path; that doesn't look too
> hard to fix but I haven't done it. If we decide to proceed with this
> overall approach I'll see about getting it sorted out.
>

Okay, that makes sense.

> Unfortunately, when I initially tried this approach, a number of
> things broke due to the fact that create_projection_path() was not
> exactly equivalent to apply_projection_to_path(). This initially
> surprised me, because create_projection_plan() contains logic to
> eliminate the Result node that is very similar to the logic in
> apply_projection_to_path(). If apply_projection_path() sees that the
> subordinate node is projection-capable, it applies the revised target
> list to the child; if not, it adds a Result node.
> create_projection_plan() does the same thing. However,
> create_projection_plan() can lose the physical tlist optimization for
> the subordinate node; it forces an exact projection even if the parent
> node doesn't require this. 0001 fixes this, so that we get the same
> plans with create_projection_path() that we would with
> apply_projection_to_path(). I think this patch is a good idea
> regardless of what we decide to do about the rest of this, because it
> can potentially avoid losing the physical-tlist optimization in any
> place where create_projection_path() is used.
>
> It turns out that 0001 doesn't manage to preserve the physical-tlist
> optimization when the projection path is attached to an upper
> relation. 0002 fixes this.
>

I have done some basic verification of 0001 and 0002, will do further
review/tests, if I don't see any objection from anyone else about the
overall approach.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2018-03-16 10:29:59 Re: missing support of named convention for procedures
Previous Message Amit Khandekar 2018-03-16 10:02:03 Hash join in SELECT target list expression keeps consuming memory