Re: why not parallel seq scan for slow functions

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: why not parallel seq scan for slow functions
Date: 2017-08-21 09:08:04
Message-ID: CAA4eK1JUvL9WS9z=5hjSuSMNCo3TdBxFa0pA=E__E=p6iUffUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 16, 2017 at 5:04 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Aug 16, 2017 at 7:23 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> On Tue, Aug 15, 2017 at 7:15 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Sat, Aug 12, 2017 at 9:18 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>>> I think skipping a generation of gather paths for scan node or top
>>>> level join node generated via standard_join_search seems straight
>>>> forward, but skipping for paths generated via geqo seems to be tricky
>>>> (See use of generate_gather_paths in merge_clump). Assuming, we find
>>>> some way to skip it for top level scan/join node, I don't think that
>>>> will be sufficient, we have some special way to push target list below
>>>> Gather node in apply_projection_to_path, we need to move that part as
>>>> well in generate_gather_paths.
>>>
>>> I don't think that can work, because at that point we don't know what
>>> target list the upper node wants to impose.
>>>
>>
>> I am suggesting to call generate_gather_paths just before we try to
>> apply projection on paths in grouping_planner (file:planner.c;
>> line:1787; commit:004a9702). Won't the target list for upper nodes be
>> available at that point?
>
> Oh, yes. Apparently I misunderstood your proposal.
>

Thanks for acknowledging the idea. I have written a patch which
implements the above idea. At this stage, it is merely to move the
discussion forward. Few things which I am not entirely happy about
this patch are:

(a) To skip generating gather path for top level scan node, I have
used the number of relations which has RelOptInfo, basically
simple_rel_array_size. Is there any problem with it or do you see any
better way?
(b) I have changed the costing of gather path for path target in
generate_gather_paths which I am not sure is the best way. Another
possibility could have been that I change the code in
apply_projection_to_path as done in the previous patch and just call
it from generate_gather_paths. I have not done that because of your
comment above thread ("is probably unsafe, because it might confuse
code that reaches the modified-in-place path through some other
pointer (e.g. code which expects the RelOptInfo's paths to still be
sorted by cost)."). It is not clear to me what exactly is bothering
you if we directly change costing in apply_projection_to_path.

Thoughts?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
parallel_paths_include_tlist_cost_v1.patch application/octet-stream 11.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-08-21 09:10:11 Re: parallelize queries containing initplans
Previous Message Haribabu Kommi 2017-08-21 08:14:11 Re: parallelize queries containing initplans