use_remote_estimate usage for join pushdown in postgres_fdw

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: use_remote_estimate usage for join pushdown in postgres_fdw
Date: 2015-12-11 09:44:30
Message-ID: CAFjFpRepSC2e3mZ1uYSopJD6R19fOZ0dNNf9Z=gnyKSB6wGk5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi All,
postgres_fdw documentation says following about use_remote_estimate (
http://www.postgresql.org/docs/devel/static/postgres-fdw.html)
--
use_remote_estimate
This option, which can be specified for a foreign table or a foreign
server, controls whether postgres_fdw issues remote EXPLAIN commands to
obtain cost estimates. A setting for a foreign table overrides any setting
for its server, but only for that table. The default is false.
--

I am trying to see, how should we use this option in the context of join
pushdown and for
that matter any pushdown involving more than one table.

I came up with following arguments
1. Foreign base relations derive their use_remote_estimate setting either
from the server setting or the per table setting. A join between two
foreign relations should derive its use_remote_estimate setting from the
joining relations (recursively). This means that we will use EXPLAIN to
estimate costs of join if "all" the involved base foreign relations have
use_remote_estimate true (either they derive it from the server level
setting or table level setting).

2. Similar to 1, but use EXPLAIN to estimate costs if "any" of the involved
base foreign relations have use_remote_estimate is true.

3. Since join between two foreign relations is not a table level
phenomenon, but a server level phenomenon, we should use server level
setting. This means that we will use EXPLAIN output to estimate costs of
join if the foreign server has use_remote_estimate true, irrespective of
the setting for individual foreign relations involved in that join.

Unfortunately the documentation and comments in code do not say much about
the intention (i.e. why and how is this setting expected to be used) of
this setting in the context or server.

The intention behind server level setting is more confusing. It does not
override table level setting, so it is not intended to be used for a
prohibitive reason like e.g. server doesn't support EXPLAIN the way it will
be interpreted locally. It seems to act more like a default in case table
level setting is absent. User may set table level use_remote_estimate to
true, if cost of EXPLAIN is very small compared to that of table scan (with
or without indexes) or adding conditional clauses to the query alters the
costs heavily that the cost of EXPLAIN itself is justified. But I can be
wrong about these intentions.

If we go by the above intention behind table level setting, 2nd argument
makes more sense as the table for which use_remote_estimate is true, can
change the cost of join heavily because of the clauses in the join and it's
better to get it from the foreign server than guessing it locally.

Comments/suggestions are welcome.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2015-12-11 09:44:58 Re: Remaining 9.5 open items
Previous Message Andres Freund 2015-12-11 09:34:13 Re: Error with index on unlogged table