Re: pgsql_fdw, FDW for PostgreSQL server

From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Shigeru HANADA *EXTERN*" <shigeru(dot)hanada(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>, "Kohei KaiGai" <kaigai(at)kaigai(dot)gr(dot)jp>, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
Subject: Re: pgsql_fdw, FDW for PostgreSQL server
Date: 2012-03-27 09:36:54
Message-ID: D960CB61B694CF459DCFB4B0128514C207AA8AFF@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Shigeru HANADA wrote:
> I've implemented pgsql_fdw's own deparser and enhanced some features
> since last post. Please apply attached patches in the order below:

> Changes from previous version
> =============================
>
> 1) Don't use remote EXPLAIN for cost/rows estimation, so now planner
> estimates result rows and costs on the basis of local statistics such
as
> pg_class and pg_statistic. To update local statistics, I added
> pgsql_fdw_analyze() SQL function which updates local statistics of a
> foreign table by retrieving remote statistics, such as pg_class and
> pg_statistic, via libpq. This would make the planning of pgsql_fdw
> simple and fast. This function can be easily modified to handle
ANALYZE
> command invoked for a foreign table (Fujita-san is proposing this as
> common feature in another thread).
>
> 2) Defer planning stuffs as long as possible to clarify the role of
each
> function. Currently GetRelSize just estimates result rows from local
> statistics, and GetPaths adds only one path which represents SeqScan
on
> remote side. As result of this change, PgsqlFdwPlanState struct is
> obsolete.

I see the advantage of being able to do all this locally, but
I think there are a lot of downsides too:
- You have an additional maintenance task if you want to keep
statistics for remote tables accurate. I understand that this may
get better in a future release.
- You depend on the structure of pg_statistic, which means a potential
incompatibility between server versions. You can add cases to
pgsql_fdw_analyze to cater for changes, but that is cumbersome and
will
only help for later PostgreSQL versions connecting to earlier ones.
- Planning and execution will change (improve, of course) between server
versions. The local planner may choose an inferior plan based on a
wrong assumption of how a certain query can be handled on the remote.
- You have no statistics if the foreign table points to a view on the
remote system.

My gut feeling is that planning should be done by the server which
will execute the query.

> 3) Implement pgsql_fdw's own deparser which pushes down collation-free
> and immutable expressions in local WHERE clause. This means that most
> of numeric conditions can be pushed down, but conditions using
character
> types are not.

I understand that this is simple and practical, but it is a pity that
this excludes equality and inequality conditions on strings.
Correct me if I am wrong, but I thought that these work the same
regardless of the collation.

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2012-03-27 09:40:34 Re: Reporting WAL file containing checkpoint's REDO record in pg_controldata's result
Previous Message Fujii Masao 2012-03-27 09:25:27 Improvement of log messages in pg_basebackup