Getting sorted data from foreign server for merge join

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Getting sorted data from foreign server for merge join
Date: 2015-11-06 04:54:44
Message-ID: CAFjFpRfeKHiCmwJ72p4=ZvuZRQsaU9tbfyW7vwr-5PPvrCbcQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi All,
PFA patch to get data sorted from the foreign server (postgres_fdw)
according to the pathkeys useful for merge join.

For a given base relation (extendable to join when that becomes available
in postgres_fdw), the patch tries to find merge joinable clauses. It then
adds paths with pathkeys extracted out of these merge joinable clauses. The
merge joinable clauses form equivalence classes. The patch searches in
root->eq_classes for equivalence members belonging to the given relation.
For every such expression it creates a single member pathkey list and
corresponding path. The test postgres_fdw.sql has an existing join which
uses merge join. With this patch the data is sorted on the foreign server
than locally.

While mergejoinable clauses can be obtained from rel->joininfo as well. But
rel->joininfo contains other clauses as well and we need extra efforts to
remove duplicates if the same expression appears in multiple merge joinable
clauses.

Two joining relations can have multiple merge joinable clauses, requiring
multi-member pathkeys so that merge join is efficient to the maximum
extent. The order in which the expressions appears in pathkeys can change
the costs of sorting the data according to the pathkeys, depending upon the
expressions and the presence of indexes containing those expressions. Thus
ideally we would need to club all the expressions appearing in all the
clauses for given two relations and create paths with pathkeys for every
order of these expressions.That explodes the number of possible paths. We
may restrict the number of paths created by considering only certain orders
like sort_inner_and_outer(). In any case, costing such paths increases the
planning time which may not be worth it. So, this patch uses a heuristic
approach of creating single member pathkeys path for every merge joinable
expression.

The pathkeys need to be canonicalised using make_canonical_pathkey(), which
is a static function. I have added a TODO and comments in the patch
explaining possible ways to avoid "extern"alization of this function.

Comments/suggestions are welcome.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
pg_sort_all_pd.patch text/x-diff 17.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2015-11-06 04:55:44 Re: Some questions about the array.
Previous Message Robert Haas 2015-11-06 04:47:20 Re: Within CF app, "Bug Fixes" should be "Bug Fixes/Refactoring"