Re: Getting sorted data from foreign server for merge join

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Getting sorted data from foreign server for merge join
Date: 2015-11-20 15:44:07
Message-ID: CAFjFpRfD7niTunGnhHjJ+mDPssAEeQSdYHwHUso3apUm8p7ONg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 19, 2015 at 7:30 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Tue, Nov 17, 2015 at 8:33 AM, Ashutosh Bapat
> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> >> Although I'm usually on the side of marking things as extern whenever
> >> we find it convenient, I'm nervous about doing that to
> >> make_canonical_pathkey(), because it has side effects. Searching the
> >> list of canonical pathkeys for the one we need is reasonable, but is
> >> it really right to ever think that we might create a new one at this
> >> stage? Maybe it is, but if so I'd like to hear a good explanation as
> >> to why.
> >
> > For a foreign table no pathkeys are created before creating paths for
> > individual foreign table since there are no indexes. For a regular table,
> > the pathkeys get created for useful indexes.
>
> OK.
>
> Could some portion of the second foreach loop inside
> get_useful_pathkeys_for_relation be replaced by a call to
> eclass_useful_for_merging? The logic looks very similar.
>

Yes. I have incorporated this change in the patch attached.

>
> More broadly, I'm wondering about the asymmetries between this code
> and pathkeys_useful_for_merging(). The latter has a
> right_merge_direction() test and a case for non-EC-derivable join
> clauses that aren't present in your code.

I wonder if we could just
> generate pathkeys speculatively and then test which ones survive
> truncate_useless_pathkeys(), or something like that. This isn't an
> enormously complicated patch, but it duplicates more of the logic in
> pathkeys.c than I'd like.
>

pathkeys_useful_for_merging(), truncate_useless_pathkeys() and host of
functions in that area are crafted to assess the usefulness of given
pathkeys which for regular tables are "speculated" from indexes on that
table. Foreign tables do not have indexes and neither we have information
about the indexes (if any) on foreign server, thus pathkeys to be assessed
are not readily available. Hence we need some other way of "speculating"
pathkeys for foreign tables. We can not just generate pathkeys because
there are infinite possible expressions on which pathkeys can be built. So,
we hunt for the expressions at various places like root_pathkeys, merge
joinable clauses etc. The seeming duplication of code is because the places
where we are hunting for useful pathkeys in case of foreign table are same
as the places used for assessing usefulness for pathkeys in case of regular
table. Thus in case of foreign tables, we always generate useful pathkeys,
which do not need any further assessment. For regular tables we have set of
pathkeys which need to be assessed. Because of this difference the code
differs in details.

right_merge_direction() compares whether a given pathkey (readily available
or derived from an index) has the same direction as required by
root->query_pathkeys or ascending direction. For foreign tables, the
pathkeys are themselves created from root->query_pathkeys, thus doesn't
require this assessment. The patch creates pathkeys other than those from
root->query_pathkeys in the ascending order (like other places in the code
eg. select_outer_pathkeys_for_merge()), which is what
right_merge_direction() assesses. So again we don't need to call
right_merge_direction().

non-EC-derivable join is interesting. Thanks for bringing it out. These are
mergejoinable clauses in an OUTER join, where columns from inner side can
be NULL, and thus not exactly equal. I will incorporate that change in the
patch. Again while doing so, we have to just pick up the right or left
equivalence class from a given RestrictInfo and don't need to assess it
further like pathkeys_useful_for_merging().

Added this change in the attached patch.

>
> I'm inclined to think that we shouldn't consider pathkeys that might
> be useful for merge-joining unless we're using remote estimates. It
> seems more speculative than pushing down a toplevel sort.
>

I agree with you but let me elaborate why I agree. The pathkeys we are
generating are heauristic in nature and thus may not be useful in the same
sense as pathkeys for regular tables. If use_remote_estimate is ON, the
planner will spend time in explaining multiple queries, but it will be in
better position to cost the usage. If use_remote_estimate is OFF, we will
altogether loose chances of merge join, which doesn't look good to me. But
a speculative costing done in this case can result in choosing wrong plan
similar to the same case as toplevel sort. But then choosing a wrong join
has more serious implications than estimating wrong cost for top level join.

>
> This patch seems rather light on test cases.
>
>
Added tests. Let me know if you have any specific scenario in mind.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
pg_sort_all_pd_v3.patch text/x-diff 26.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-11-20 17:01:28 Re: [BUGS] BUG #13779: Inherited check constraint becomes non-inherited when related column is changed
Previous Message Craig Ringer 2015-11-20 14:20:40 Re: Selective logical replication