Re: postgres_fdw bug in 9.6

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: postgres_fdw bug in 9.6
Date: 2016-12-27 13:03:11
Message-ID: CAFjFpRcQDfT6p23JJXrHN=w6enhVR42Ksz6coaVZL8Tba3OPhg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
>> 3. Talking about saving some CPU cycles - if a clauseless full join can be
>> implemented only using merge join, probably that's the only path available
>> in
>> the list of paths for the given relation. Instead of building the same
>> path
>> anew, should we use the existing path like GetExistingLocalJoinPath() for
>> that
>> case?
>
>
> Hm, that might be an idea, but my concern about that is: the existing path
> wouldn't always be guaranteed to be unprameterized.

Why? We retain all the parameterizations (including no
parameterization) available in the pathlist, so if it's possible to
create an unparameterized path, there will be one.

>
>> In fact, I am wondering whether we should look for an existing nestloop
>> path for all joins except full, in which case we should look for merge or
>> hash
>> join path. We go on building a new path, if only there isn't an existing
>> one.
>> That will certainly save some CPU cycles spent in costing the path.
>
>
> Maybe in many cases, nestloop paths for INNER/LEFT/SEMI/ANTI might be
> removed from the rel's pathlist by dominated merge or hash join paths, so
> searching the pathlist might cause a useless overhead.
>

Fare point.

>
>> 5. Per comment below a clauseless full join can be implemented using only
>> merge
>> join. Why are we checking extra->mergejoin_allowed? Shouldn't it be true
>> here?
>> /*
>> * Special corner case: for "x FULL JOIN y ON true", there will be
>> no
>> * join clauses at all. Note that mergejoin is our only join type
>> * that supports FULL JOIN without any join clauses, and in that
>> case
>> * it doesn't require the input paths to be well ordered, so
>> generate
>> * a clauseless mergejoin path from the cheapest-total-cost paths.
>> */
>> if (extra->mergejoin_allowed && !extra->mergeclause_list)
>
>
> See the comments for select_mergejoin_clauses:
>
> * select_mergejoin_clauses
> * Select mergejoin clauses that are usable for a particular join.
> * Returns a list of RestrictInfo nodes for those clauses.
> *
> * *mergejoin_allowed is normally set to TRUE, but it is set to FALSE if
> * this is a right/full join and there are nonmergejoinable join clauses.
> * The executor's mergejoin machinery cannot handle such cases, so we have
> * to avoid generating a mergejoin plan. (Note that this flag does NOT
> * consider whether there are actually any mergejoinable clauses. This is
> * correct because in some cases we need to build a clauseless mergejoin.
> * Simply returning NIL is therefore not enough to distinguish safe from
> * unsafe cases.)

If mergejoin_allowed is true and mergeclauselist is non-NIL but
hashclauselist is NIL (that's rare but there can be types has merge
operators but not hash operators), we will end up returning NULL. I
think we want to create a merge join in that case. I think the order
of conditions should be 1. check hashclause_list then create hash join
2. else check if merge allowed, create merge join. It looks like that
would cover all the cases, if there aren't any hash clauses, and also
no merge clauses, we won't be able to implement a FULL join, so it
will get rejected during path creation itself.

>
>> Rethinking about the problem, the error comes because the inner or outer
>> plan
>> of a merge join plan doesn't have pathkeys required by the merge join.
>> This
>> happens because the merge join path had foreign path with pathkeys as
>> inner or
>> outer path and corresponding fdw_outerpath didn't have those pathkeys. I
>> am
>> wondering whether the easy and possibly correct solution here is to not
>> replace
>> a ForeignPath with fdw_outerpath in GetExistingLocalJoinPath()? If we
>> don't do
>> that, there won't be error building merge join plan and
>> postgresRecheckForeignScan() would correctly route the EPQ checks to the
>> local
>> plan available as outer plan. Attached patch with that code removed.
>
>
> That might be fine for PG9.6, but I'd like to propose replacing
> GetExistingLocalJoinPath with CreateLocalJoinPath for PG10, because (1)
> GetExistingLocalJoinPath might choose an overkill, merge or hash join path
> for INNER/LEFT/SEMI/ANTI, not a nestloop join path, which might cause an
> overhead at EPQ rechecks, and

The reason we chose to pick up an existing path was that the
discussion in thread [1] concluded the efficiency of the local plan
wasn't a concern for EPQ. Are we now saying something otherwise?

> (2) choosing a local join path randomly from
> the rel's pathlist wouldn't be easy to understand.
>

Easy to understand for whom? Can you please elaborate?

[1] https://www.postgresql.org/message-id/flat/565E638E(dot)8020703%40lab(dot)ntt(dot)co(dot)jp#565E638E(dot)8020703(at)lab(dot)ntt(dot)co(dot)jp
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2016-12-27 13:15:41 Re: PATCH: recursive json_populate_record()
Previous Message Etsuro Fujita 2016-12-27 12:51:06 Re: Push down more full joins in postgres_fdw