Re: why not parallel seq scan for slow functions

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: why not parallel seq scan for slow functions
Date: 2017-08-12 13:18:51
Message-ID: CAA4eK1+X89Qk8k3Q9feiOyy5rvbiMjsS55e6pDLA0zYRU+ACMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 10, 2017 at 1:07 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Aug 8, 2017 at 3:50 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> Right.
>>
>> I see two ways to include the cost of the target list for parallel
>> paths before rejecting them (a) Don't reject parallel paths
>> (Gather/GatherMerge) during add_path. This has the danger of path
>> explosion. (b) In the case of parallel paths, somehow try to identify
>> that path has a costly target list (maybe just check if the target
>> list has anything other than vars) and use it as a heuristic to decide
>> that whether a parallel path can be retained.
>
> I think the right approach to this problem is to get the cost of the
> GatherPath correct when it's initially created. The proposed patch
> changes the cost after-the-fact, but that (1) doesn't prevent a
> promising path from being rejected before we reach this point and (2)
> is probably unsafe, because it might confuse code that reaches the
> modified-in-place path through some other pointer (e.g. code which
> expects the RelOptInfo's paths to still be sorted by cost). Perhaps
> the way to do that is to skip generate_gather_paths() for the toplevel
> scan/join node and do something similar later, after we know what
> target list we want.
>

I think skipping a generation of gather paths for scan node or top
level join node generated via standard_join_search seems straight
forward, but skipping for paths generated via geqo seems to be tricky
(See use of generate_gather_paths in merge_clump). Assuming, we find
some way to skip it for top level scan/join node, I don't think that
will be sufficient, we have some special way to push target list below
Gather node in apply_projection_to_path, we need to move that part as
well in generate_gather_paths.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-08-12 15:31:35 Re: pg_stat_statements query normalization, and the 'in' operator
Previous Message Michael Paquier 2017-08-12 11:46:38 Regressions failures with libxml2 on ArchLinux