Re: pgsql_fdw, FDW for PostgreSQL server

From: Shigeru HANADA <shigeru(dot)hanada(at)gmail(dot)com>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Subject: Re: pgsql_fdw, FDW for PostgreSQL server
Date: 2012-03-27 11:14:09
Message-ID: 4F71A101.9060407@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the comments.

(2012/03/27 18:36), Albe Laurenz wrote:
>> 2) Defer planning stuffs as long as possible to clarify the role of
> each
>> function. Currently GetRelSize just estimates result rows from local
>> statistics, and GetPaths adds only one path which represents SeqScan
> on
>> remote side. As result of this change, PgsqlFdwPlanState struct is
>> obsolete.
>
> I see the advantage of being able to do all this locally, but
> I think there are a lot of downsides too:
> - You have an additional maintenance task if you want to keep
> statistics for remote tables accurate. I understand that this may
> get better in a future release.
> - You depend on the structure of pg_statistic, which means a potential
> incompatibility between server versions. You can add cases to
> pgsql_fdw_analyze to cater for changes, but that is cumbersome and
> will
> only help for later PostgreSQL versions connecting to earlier ones.
> - Planning and execution will change (improve, of course) between server
> versions. The local planner may choose an inferior plan based on a
> wrong assumption of how a certain query can be handled on the remote.
> - You have no statistics if the foreign table points to a view on the
> remote system.

Especially for 2nd and 4th, generating pg_statistic records without
calling do_analyze_rel() seems unpractical in multiple version
environment. As you pointed out, I've missed another
semantics-different problem here. We would have to use do_analyze_rel()
and custom sampling function which returns sample rows from remote data
source, if we want to have statistics of foreign data locally. This
method would be available for most of FDWs, but requires some changes in
core. [I'll comment on Fujita-san's ANALYZE patch about this issue soon.]

> My gut feeling is that planning should be done by the server which
> will execute the query.

Agreed, if selectivity of both local filtering and remote filtering were
available, we can estimate result rows correctly and choose better plan.

How about getting # of rows estimate by executing EXPLAIN for
fully-fledged remote query (IOW, contains pushed-down WHERE clause), and
estimate selectivity of local filter on the basis of the statistics
which are generated by FDW via do_analyze_rel() and FDW-specific
sampling function? In this design, we would be able to use quite
correct rows estimate because we can consider filtering stuffs done on
each side separately, though it requires expensive remote EXPLAIN for
each possible path.

>> 3) Implement pgsql_fdw's own deparser which pushes down collation-free
>> and immutable expressions in local WHERE clause. This means that most
>> of numeric conditions can be pushed down, but conditions using
> character
>> types are not.
>
> I understand that this is simple and practical, but it is a pity that
> this excludes equality and inequality conditions on strings.
> Correct me if I am wrong, but I thought that these work the same
> regardless of the collation.

You are right, built-in equality and inequality operators don't cause
collation problem. Perhaps allowing them would cover significant cases
of string comparison, but I'm not sure how to determine whether an
operator is = or != in generic way. We might have to hold list of oid
for collation-safe operator/functions until we support ROUTINE MAPPING
or something like that... Anyway, I'll fix pgsql_fdw to allow = and !=
for character types.

Regards,
--
Shigeru Hanada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thom Brown 2012-03-27 11:32:16 Re: pgsql_fdw, FDW for PostgreSQL server
Previous Message Fujii Masao 2012-03-27 11:13:34 Re: PATCH: pg_basebackup (missing exit on error)