Re: [HACKERS] why not parallel seq scan for slow functions

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Marina Polyakova <m(dot)polyakova(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: [HACKERS] why not parallel seq scan for slow functions
Date: 2018-02-19 04:05:54
Message-ID: CAFjFpReaxMEgYzZL6zqV_fu0F77C3rQqxciBG0DvpZJT6ktG3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 17, 2018 at 8:17 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Fri, Feb 16, 2018 at 9:29 AM, Ashutosh Bapat
> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>> On Thu, Feb 15, 2018 at 7:47 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>> On Thu, Feb 15, 2018 at 4:18 PM, Ashutosh Bapat
>>> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>>>> I happened to look at the patch for something else. But here are some
>>>> comments. If any of those have been already discussed, please feel
>>>> free to ignore. I have gone through the thread cursorily, so I might
>>>> have missed something.
>>>>
>>>> In grouping_planner() we call query_planner() first which builds the
>>>> join tree and creates paths, then calculate the target for scan/join
>>>> rel which is applied on the topmost scan join rel. I am wondering
>>>> whether we can reverse this order to calculate the target list of
>>>> scan/join relation and pass it to standard_join_search() (or the hook)
>>>> through query_planner().
>>>>
>>>
>>> I think the reason for doing in that order is that we can't compute
>>> target's width till after query_planner(). See the below note in
>>> code:
>>>
>>> /*
>>> * Convert the query's result tlist into PathTarget format.
>>> *
>>> * Note: it's desirable to not do this till after query_planner(),
>>> * because the target width estimates can use per-Var width numbers
>>> * that were obtained within query_planner().
>>> */
>>> final_target = create_pathtarget(root, tlist);
>>>
>>> Now, I think we can try to juggle the code in a way that the width can
>>> be computed later, but the rest can be done earlier.
>>
>> /* Convenience macro to get a PathTarget with valid cost/width fields */
>> #define create_pathtarget(root, tlist) \
>> set_pathtarget_cost_width(root, make_pathtarget_from_tlist(tlist))
>> create_pathtarget already works that way. We will need to split it.
>>
>> Create the Pathtarget without widths. Apply width estimates once we
>> know the width of Vars somewhere here in query_planner()
>> /*
>> * We should now have size estimates for every actual table involved in
>> * the query, and we also know which if any have been deleted from the
>> * query by join removal; so we can compute total_table_pages.
>> *
>> * Note that appendrels are not double-counted here, even though we don't
>> * bother to distinguish RelOptInfos for appendrel parents, because the
>> * parents will still have size zero.
>> *
>> The next step is building the join tree. Set the pathtarget before that.
>>
>>> However, I think
>>> that will be somewhat major change
>>
>> I agree.
>>
>>> still won't address all kind of
>>> cases (like for ordered paths) unless we can try to get all kind of
>>> targets pushed down in the call stack.
>>
>> I didn't understand that.
>>
>
> The places where we use a target different than the target which is
> pushed via query planner can cause a similar problem (ex. see the
> usage of adjust_paths_for_srfs) because the cost of that target
> wouldn't be taken into consideration for Gather's costing.
>

Right. Right now apply_projection_to_path() or adjust_paths_for_srfs()
do not take into consideration the type of path whose cost is being
adjusted for the new targetlist. That works for most of the paths but
not all the paths like custom, FDW or parallel paths. The idea I am
proposing is to compute final targetlist before query planner so that
it's available when we create paths for the topmost scan/join
relation. That way, any path creation logic can then take advantage of
this list to compute costs, not just parallel paths.

--
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-02-19 04:19:10 Re: non-bulk inserts and tuple routing
Previous Message Michael Paquier 2018-02-19 03:42:35 Re: Creation of wiki page for open items of v11