Re: postgres_fdw bug in 9.6

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: postgres_fdw bug in 9.6
Date: 2016-12-27 07:41:16
Message-ID: 396aac47-68e1-3e2e-2918-5a4ba65b6be5@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016/12/22 1:04, Ashutosh Bapat wrote:
> Some review comments

Thanks for the review!

> 2. We should try to look for other not-so-cheap paths if the cheapest one is
> paramterized. You might want to use get_cheapest_path_for_pathkeys() to find a
> suitable unparameterized path by passing NULL for required_outer and NIL for
> pathkeys, that's a very strange usage, but I think it will serve the purpose.
> On the thread we discussed that we should save the epq_path create for lower
> join and use it here. That may be better than searching for a path.
> + /* Give up if the cheapest-total-cost paths are parameterized. */
> + if (!bms_is_empty(PATH_REQ_OUTER(outer_path)) ||
> + !bms_is_empty(PATH_REQ_OUTER(inner_path)))
> + return NULL;

I did that because I think that would work well for postgres_fdw, but I
agree with you. Will revise.

> 3. Talking about saving some CPU cycles - if a clauseless full join can be
> implemented only using merge join, probably that's the only path available in
> the list of paths for the given relation. Instead of building the same path
> anew, should we use the existing path like GetExistingLocalJoinPath() for that
> case?

Hm, that might be an idea, but my concern about that is: the existing
path wouldn't always be guaranteed to be unprameterized.

> In fact, I am wondering whether we should look for an existing nestloop
> path for all joins except full, in which case we should look for merge or hash
> join path. We go on building a new path, if only there isn't an existing one.
> That will certainly save some CPU cycles spent in costing the path.

Maybe in many cases, nestloop paths for INNER/LEFT/SEMI/ANTI might be
removed from the rel's pathlist by dominated merge or hash join paths,
so searching the pathlist might cause a useless overhead.

> 4. Following comment mentions only hash join, but the code creates a merge join
> or hash join path.
> * If the jointype is JOIN_FULL, try to create a hashjoin join path from

Will revise.

> 5. Per comment below a clauseless full join can be implemented using only merge
> join. Why are we checking extra->mergejoin_allowed? Shouldn't it be true here?
> /*
> * Special corner case: for "x FULL JOIN y ON true", there will be no
> * join clauses at all. Note that mergejoin is our only join type
> * that supports FULL JOIN without any join clauses, and in that case
> * it doesn't require the input paths to be well ordered, so generate
> * a clauseless mergejoin path from the cheapest-total-cost paths.
> */
> if (extra->mergejoin_allowed && !extra->mergeclause_list)

See the comments for select_mergejoin_clauses:

* select_mergejoin_clauses
* Select mergejoin clauses that are usable for a particular join.
* Returns a list of RestrictInfo nodes for those clauses.
*
* *mergejoin_allowed is normally set to TRUE, but it is set to FALSE if
* this is a right/full join and there are nonmergejoinable join clauses.
* The executor's mergejoin machinery cannot handle such cases, so we have
* to avoid generating a mergejoin plan. (Note that this flag does NOT
* consider whether there are actually any mergejoinable clauses. This is
* correct because in some cases we need to build a clauseless mergejoin.
* Simply returning NIL is therefore not enough to distinguish safe from
* unsafe cases.)

> Rethinking about the problem, the error comes because the inner or outer plan
> of a merge join plan doesn't have pathkeys required by the merge join. This
> happens because the merge join path had foreign path with pathkeys as inner or
> outer path and corresponding fdw_outerpath didn't have those pathkeys. I am
> wondering whether the easy and possibly correct solution here is to not replace
> a ForeignPath with fdw_outerpath in GetExistingLocalJoinPath()? If we don't do
> that, there won't be error building merge join plan and
> postgresRecheckForeignScan() would correctly route the EPQ checks to the local
> plan available as outer plan. Attached patch with that code removed.

That might be fine for PG9.6, but I'd like to propose replacing
GetExistingLocalJoinPath with CreateLocalJoinPath for PG10, because (1)
GetExistingLocalJoinPath might choose an overkill, merge or hash join
path for INNER/LEFT/SEMI/ANTI, not a nestloop join path, which might
cause an overhead at EPQ rechecks, and (2) choosing a local join path
randomly from the rel's pathlist wouldn't be easy to understand.

I'll add this to the next CF.

Best regards,
Etsuro Fujita

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2016-12-27 07:54:27 merging some features from plpgsql2 project
Previous Message Amit Kapila 2016-12-27 06:58:25 Re: Write Ahead Logging for Hash Indexes