Re: why not parallel seq scan for slow functions

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: why not parallel seq scan for slow functions
Date: 2017-11-06 10:20:28
Message-ID: CAA4eK1J+S4W+W9oaToNHA8x8xh_Kv9wQgGbPfjo=-HtnsPQO9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 6, 2017 at 3:51 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Nov 5, 2017 at 12:57 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> Thanks for the confirmation. Find rebased patch attached.
>
> This looks like it's on the right track to me. I hope Tom will look
> into it, but if he doesn't I may try to get it committed myself.
>
> - if (rel->reloptkind == RELOPT_BASEREL)
> - generate_gather_paths(root, rel);
> + if (rel->reloptkind == RELOPT_BASEREL &&
> + root->simple_rel_array_size > 2 &&
> + !root->append_rel_list)
>
> This test doesn't look correct to me. Actually, it doesn't look
> anywhere close to correct to me. So, one of us is very confused...
> not sure whether it's you or me.
>

It is quite possible that I haven't got it right, but it shouldn't be
completely bogus as it stands the regression tests and some manual
verification. Can you explain what is your concern about this test?

> simple_gather_path = (Path *)
> create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
> NULL, NULL);
> +
> + /* Add projection step if needed */
> + if (target && simple_gather_path->pathtarget != target)
> + simple_gather_path = apply_projection_to_path(root, rel,
> simple_gather_path, target);
>
> Instead of using apply_projection_to_path, why not pass the correct
> reltarget to create_gather_path?
>

We need to push it to gather's subpath as is done in
apply_projection_to_path and then we have to cost it accordingly. I
think if we don't use apply_projection_to_path then we might end up
with much of the code similar to it in generate_gather_paths. In
fact, I have tried something similar to what you are suggesting in the
first version of patch [1] and it didn't turn out to be clean. Also,
I think we already do something similar in create_ordered_paths.

> + /* Set or update cheapest_total_path and related fields */
> + set_cheapest(current_rel);
>
> I wonder if it's really OK to call set_cheapest() a second time for
> the same relation...
>

I think if we want we can avoid it by checking whether we have
generated any gather path for the relation (basically, check if it has
partial path list). Another idea could be that we consider the
generation of gather/gathermerge for top-level scan/join relation as a
separate step and generate a new kind of upper rel for it which will
be a mostly dummy but will have paths for gather/gathermerge.

[1] - https://www.postgresql.org/message-id/CAA4eK1JUvL9WS9z%3D5hjSuSMNCo3TdBxFa0pA%3DE__E%3Dp6iUffUQ%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2017-11-06 11:14:51 Re: Display number of heap accesses for index scans
Previous Message Ashutosh Bapat 2017-11-06 10:19:32 Re: [PATCH] Overestimated filter cost and its mitigation