Re: patch: SQL/MED(FDW) DDL

From: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: SAKAMOTO Masahiko <sakamoto(dot)masahiko(at)oss(dot)ntt(dot)co(dot)jp>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: patch: SQL/MED(FDW) DDL
Date: 2010-09-29 07:56:48
Message-ID: 20100929165647.9FC3.6989961C@metrosystems.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 28 Sep 2010 10:26:42 -0400
Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Sep 27, 2010 at 2:50 AM, SAKAMOTO Masahiko
> <sakamoto(dot)masahiko(at)oss(dot)ntt(dot)co(dot)jp> wrote:
> > ?http://wiki.postgresql.org/wiki/SQL/MED
> With regard to what is written here, it strikes me that it would be an
> extremely bad idea to try to mix reloptions or attoptions with
> fdwoptions. fdwoptions are options to be passed transparently to the
> fdw to handle as it likes; rel/attoptions affect the behavior of PG.

In current patch, fdwoptions for relations have been separated from
reloptins by introducing pg_foreign_table catalog. As mentioned in
wiki, integration into rel/attoptions is nothing but an idea, we're
willing to add pg_foreign_attribute catalog which has columns:

farelid oid not null
faattnum smallint not null
faoptions text[] not null

Though this catalog has only fdwoptions as non-key value.
Or, adding attrfdwoptions column to pg_attribute catalog is better?

> I think the section about WHERE clause push-down is way off base.
> First, it seems totally wrong to assume that the same functions and
> operators will be defined on the remote side as you have locally;
> indeed, for CSV files, you won't have anything defined on the remote
> side at all. You need some kind of a discovery mechanism here to
> figure out which quals are push-downable. And it should probably be
> something generic, not a bunch of hard-wired rules that may or may not
> be correct in any particular case. What if the remote side is a
> competing database product that doesn't understand X = ANY(Y)?
> Second, even if a functions or operators does exist on both sides of
> the link, how do you know whether they have compatible semantics?
> Short of solving the entscheidungsproblem, you're not going to be able
> to determine that algorithmically, so you need some kind of mechanism
> for controlling what assumptions get made. Otherwise, you'll end up
> with queries that don't work and no way for the user to fix it.

First of all, WHERE clause push-down ideas written in wiki are just
for FDW for PostgreSQL, implicitly same version, so we have assumed
that the remote side has same syntax/semantics. WHERE clause
push-down is implemented in postgresql_fdw, and optimizer/planner are
not .

The postgresql_fdw pushes down WHERE clause with following steps.
* scans all quals in the PlanState and pickup remote-able quals
* for each remote-able quals,
* adds WHERE clause to remote SQL statement
* remove the qual from PlanState node to avoid duplicate
evaluation at upper layer ExecScan()

If the foreign data wrapper didn't support WHERE clause push-down,
such as CSV-wrapper, the wrapper can retrieve all tuples from remote
side and pass them to upper layer, and ExecScan() will filter the
tuples based on the quals in the PlanState.

By the way, in current implementation, all operators/functions in
SELECT clause will be evaluated on the local side always, so FDWs
should provide only plain column values.

> It seems to me that the API should allow PG to ask the FDW questions like this:
>
> - How many tuples are there on the remote side?
> - Here is a qual. Are you able to evaluate this qual remotely?
> - What are the startup and total costs of a sequential scan of the
> remote side with the following set of remotely executable quals?
> - Are there any indices available on the remote side, and if so what
> are there names and which columns do they index in which order
> (asc/desc, nulls first/last)?
> - What are the startup and total costs of an index scan of the remote
> side using the index called $NAME given the following set of remotely
> executable quals?
>
> and, as you mentIon:
>
> - Please update pg_statistic for this foreign table, if you have that
> capability.
>
> Then:
>
> - Begin a sequential scan with the following set of quals.
> - Begin an index scan using the index called X with the following set of quals.
> - Fetch next tuple.
> - End scan.
>
> Maybe that's too much for a first version but if we're not going to
> deal with the problems in a general way, then we ought to not deal
> with them at all, rather than having hacky rules that will work if
> your environment is set up in exactly the way the code expects and
> otherwise break horribly.
Using remote indexes might be very effective, but I think there are
many issues.

For instance, how can planner/optimizer know whether the foreign table
has been indexed or not? Checking remote catalogs for each scan must
be a bad idea. HiRDB, Hitachi's dbms product, seems to have
introduced FOREIGN INDEX for that purpose.

I'll consider about cost estimation and path/plan generation again,
and post later.

Regards, Shigeru Hanada
--
Shigeru Hanada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2010-09-29 07:56:57 Re: is sync rep stalled?
Previous Message Itagaki Takahiro 2010-09-29 07:56:33 operator dependency of commutator and negator