Re: Question about commit 11cf92f6e2e13c0a6e3f98be3e629e6bd90b74d5

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question about commit 11cf92f6e2e13c0a6e3f98be3e629e6bd90b74d5
Date: 2019-03-05 08:00:43
Message-ID: 5C7E2CAB.2040001@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(2019/03/02 1:10), Robert Haas wrote:
> On Fri, Mar 1, 2019 at 5:47 AM Etsuro Fujita
> <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> Robert, I CCed you because you are the author of that commit. Before
>> that commit ("Rewrite the code that applies scan/join targets to
>> paths."), apply_scanjoin_target_to_paths() had a boolean parameter named
>> modify_in_place, and used apply_projection_to_path(), not
>> create_projection_path(), to adjust scan/join paths when modify_in_place
>> was true, which allowed us to save cycles at plan creation time by
>> avoiding creating projection paths, which I think would be a good thing,
>> but that commit removed that. Why?
>
> One of the goals of the commit was to properly account for the cost of
> computing the target list. Before parallel query and partition-wise
> join, it didn't really matter what the cost of computing the target
> list was, because every path was going to have to do the same work, so
> it was just a constant factor getting added to every path. However,
> parallel query and partition-wise join mean that some paths can
> compute the final target list more cheaply than others, and that turns
> out to be important for things like PostGIS. One of the complaints
> that provoked that change was that PostGIS was picking non-parallel
> plans even when a parallel plan was substantially superior, because it
> wasn't accounting for the fact that in the parallel plan, the cost of
> computing the target-list could be shared across all the workers
> rather than paid entirely by the leader.
>
> In order to accomplish this goal of properly accounting for the cost
> of computing the target list, we need to create a new path, not just
> jam the target list into an already-costed path. Note that we did
> some performance optimization around the same time to minimize the
> performance hit here (see d7c19e62a8e0a634eb6b29f8f1111d944e57081f,
> and I think there may have been something else as well although I
> can't find it right now).

apply_projection_to_path() not only jams the given tlist into the
existing path but updates its tlist eval costs appropriately except for
the cases of Gather and GatherMerge:

/*
* If the path happens to be a Gather or GatherMerge path, we'd like to
* arrange for the subpath to return the required target list so that
* workers can help project. But if there is something that is not
* parallel-safe in the target expressions, then we can't.
*/
if ((IsA(path, GatherPath) ||IsA(path, GatherMergePath)) &&
is_parallel_safe(root, (Node *) target->exprs))
{
/*
* We always use create_projection_path here, even if the
subpath is
* projection-capable, so as to avoid modifying the subpath in
place.
* It seems unlikely at present that there could be any other
* references to the subpath, but better safe than sorry.
*
--> * Note that we don't change the parallel path's cost estimates; it
--> * might be appropriate to do so, to reflect the fact that the
bulk of
--> * the target evaluation will happen in workers.
*/
if (IsA(path, GatherPath))
{
GatherPath *gpath = (GatherPath *) path;

gpath->subpath = (Path *)
create_projection_path(root,
gpath->subpath->parent,
gpath->subpath,
target);
}
else
{
GatherMergePath *gmpath = (GatherMergePath *) path;

gmpath->subpath = (Path *)
create_projection_path(root,
gmpath->subpath->parent,
gmpath->subpath,
target);
}
}

>> The real reason for this question is: I noticed that projection paths
>> put on foreign paths will make it hard for FDWs to detect whether there
>> is an already-well-enough-sorted remote path in the pathlist for the
>> final scan/join relation as an input relation to GetForeignUpperPaths()
>> for the UPPERREL_ORDERED step (the IsA(path, ForeignPath) test would not
>> work well enough to detect remote paths!), so I'm wondering whether we
>> could revive that parameter like the attached, to avoid the overhead at
>> plan creation time and to make the FDW work easy. Maybe I'm missing
>> something, though.
>
> I think this would be a bad idea, for the reasons explained above. I
> also think that it's probably the wrong direction on principle. I
> think the way we account for target lists is still pretty crude and
> needs to be thought of as more of a real planning step and less as
> something that we can just ignore when it's inconvenient for some
> reason.

I'm not sure I 100% agree with you, but I also think we need to give
more thought to the tlist-eval-cost adjustment.

> I think the FDW just needs to look through the projection
> path and see what's underneath, but also take the projection path's
> target list into account when decide whether more can be pushed down.

OK, I'll go with that.

Thanks for the explanation!

Best regards,
Etsuro Fujita

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2019-03-05 08:10:36 Re: Re: [PATCH] pgbench tap tests fail if the path contains a perl special character
Previous Message David Steele 2019-03-05 07:57:25 Re: Re: \describe*