Re: postgres_fdw bug in 9.6

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: postgres_fdw bug in 9.6
Date: 2017-01-10 07:08:01
Message-ID: bb597d1f-ad78-b47e-3255-cd3303662aa8@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017/01/09 22:36, Ashutosh Bapat wrote:

I wrote:
>> Done. Attached is the new version of the patch.

> Why is this so?
> * If the outer cheapest-total path is parameterized by the inner
> * rel, we can't generate a nestloop path.

That parameterization means that for each row scanned from the inner
rel, the nestloop has to pass down the param values of the row into the
scan of the outer rel; but it's not possible for the nestloop to work
like that. You can find the same tests in match_unsorted_outer.

> * If either cheapest-total path is parameterized by the other
> * rel, we can't generate a hashjoin/mergejoin path.

The above applies to hashjoins and mergejoins. In addition, those joins
cannot pass down the param values of each outer-rel row into the
inner-rel scan, like nestloop joins, so the inner path mustn't be
parameterized by the outer rel in those joins. See the same tests in
sort_inner_and_outer and hash_inner_and_outer.

While re-reading the patch, I noticed this could be simplified:

! case JOIN_FULL:
! /*
! * If either cheapest-total path is parameterized by the other
! * rel, we can't generate a hashjoin/mergejoin path. (There's no
! * use looking for alternative input paths, since these should
! * already be the least-parameterized available paths.)
! */
! if (PATH_PARAM_BY_REL(outer_path, innerrel) ||
! PATH_PARAM_BY_REL(inner_path, outerrel))
! return NULL;
! /* If proposed path is still parameterized, we can't use it. */
! required_outer = calc_non_nestloop_required_outer(outer_path,
! inner_path);

This would mean that both the outer and inner paths must be
unparameterized. Will simplify if it's ok.

> I don't think we need to provide details of what kind of path the function
> builds.
> + join plan. <literal>CreateLocalJoinPath</> builds a nested loop join
> + path for the specified join relation, except when the join type is
> + <literal>FULL</>, in which case a merge or hash join path is built.

Ok, will remove.

> I am not able to understand the code or the comment below
> +
> + /* Save first mergejoin information for possible use by the FDW */
> + if (outerkeys == all_pathkeys)
> + {
> + extra->mergeclauses = cur_mergeclauses;
> + extra->merge_pathkeys = merge_pathkeys;
> + extra->merge_outerkeys = outerkeys;
> + extra->merge_innerkeys = innerkeys;
> + }

In the case when mergejoin is allowed and we have merge clauses but no
hash clauses, CreateLocalJoinPath creates a mergejoin from the
cheapest-total paths for the outer and inner rels, by explicitly sorting
both the outer and inner rels. The above is for creating the mergejoin
in CreateLocalJoinPath; since sort_inner_and_outer also creates
mergejoins from the cheapest-total paths, the above code saves data
about one of mergejoins created there so that CreateLocalJoinPath uses
that data to create the mergejoin.

On second thought, I noticed that this could be simplified a bit as
well; since the output sort order (merge_pathkeys) of the mergejoin
implementing a FULL join is NIL (see build_join_pathkeys), we wouldn't
need to save that info in merge_pathkeys. Will simplify if it's ok.

> If the inner and/or outer paths are not ordered as required, we will need to
> order them. Code below doesn't seem to handle that case.
> /*
> * If the paths are already well enough ordered, we can
> * skip doing an explicit sort.
> */
> if (outerkeys &&
> pathkeys_contained_in(outerkeys, outer_path->pathkeys))
> outerkeys = NIL;
> if (innerkeys &&
> pathkeys_contained_in(innerkeys, inner_path->pathkeys))
> innerkeys = NIL;

I might be missing something, but if the outer/inner paths are already
well sorted, we wouldn't need an explicit sort, so we can set the
outerkeys/innerkeys to NIL safely. (You can find the same optimization
in try_mergejoin_path.)

Thanks for the review!

Best regards,
Etsuro Fujita

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2017-01-10 07:19:40 Re: Block level parallel vacuum WIP
Previous Message Michael Paquier 2017-01-10 06:52:42 Re: DROP FUNCTION of multiple functions