Re: patch: SQL/MED(FDW) DDL

From: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>
To: SAKAMOTO Masahiko <sakamoto(dot)masahiko(at)oss(dot)ntt(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: patch: SQL/MED(FDW) DDL
Date: 2010-09-27 14:05:07
Message-ID: 20100927230506.C6A4.6989961C@metrosystems.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

On Mon, 27 Sep 2010 15:50:34 +0900
SAKAMOTO Masahiko <sakamoto(dot)masahiko(at)oss(dot)ntt(dot)co(dot)jp> wrote:
> Right. In any case, I should clearify what this API could cover
> by this patch and what could not.
> # And also how far I and my collaborator can implement..
>
> As Itagaki points out, we have two types of FDW implementations in
> progress: flat file wrapper and postgresql wrapper. these items are
> described in wiki:
> http://wiki.postgresql.org/wiki/SQL/MED
>
> But it may be hard to get what this fdw routines 'could' provide,
> so I and my collaborator is now summarizing these items to discuss and
> ask for help. Anyway I think these items are needed to discuss further.
> - API design that is used in executor for a single external table.
> - simple FDW implements to clearfy what this API could provide.

To discuss how the SQL/MED patches should be, we made summary of
current SQL/MED proposal.

* What the current SQL/MED patches provide

Recent patch fdw_select20100917 consists of some parts: I/F of
FDW, new executor node ForeignScan, connection caching, and FDW
for PostgreSQL. FDW for flat file is not included. All of them
are based on DDLs and catalogs which are provided by
fdw_table20100917 patch.

Itagaki has summarized the details of fdw_table20100917 patch
well. Thanks for review.
http://archives.postgresql.org/pgsql-hackers/2010-09/msg01653.php

The I/F of FDW is defined as struct FdwRoutine, set of API
functions which are called from backend modules. The APIs are
designed to support only scanning a foreign at once. Path/Plan
optimizations like JOIN/UNION push-down are out of scope. Such
optimizations require planner hook as mentioned by Itagaki before.
In short, our current goal is to implement SeqScan for foreign
tables.

ForeignScan node is an executor node which is like SeqScan node
for local table. Optimizer generates T_ForeignScan path instead
of T_SeqScan path for a foreign table in set_plain_rel_pathlist()
if the RangeTblEntry was a foreign table. Index paths and
tidscan paths are never generated for foreign tables.

ForeignScanState and FdwReply are introduced to represent the
status of a foreign scan. ForeignScanState is a subclass of
ScanState, and FdwReply is a abstract type which is used to pass
FDW-specific data between API calls.

* Details of FDW API

FDWs should implement HANDLER function which returns a pointer to
a FdwRoutine instance which has pointers to actual functions.

struct FdRoutine {

/*
* ConnectServer() will be called from ExecInitForeignScan()
* if the backend has no connection which can be used to
* execute the foreign query for the foreign table.
* FDW should establish a connection between foreign server
* and return it with casting to pointer to FSConnection
* (abstract connection type). If the FDW doesn't need any
* connection, returning NULL is OK.
* The arguments, server and user, can be used to extract
* connection information.
*/
FSConnection* (*ConnectServer)(ForeignServer *server,
UserMapping *user);

/*
* FreeFSConnection() will be called when backend dies or
* DISCARD ALL command was executed.
* FDW should close connection gracefully and free resources
* if any.
*/
void (*FreeFSConnection)(FSConnection *conn);

/*
* Open() will be called from ExecInitForeignScan().
* FDW should initialize ForeignScanState, internal state of
* a foreign scan, and ready to return tuple in next Iterate()
* call.
* For instance, FDW for PostgreSQL only generate SQL from
* ScanState. And we implemented WHERE clause push-down here.
*/
void (*Open)(ForeignScanState *scanstate);

/*
* Iterate() will be called from ExecForeignScan() when the
* executor requests next tuple.
* For instance, FDW for PostgreSQL executes foreign query at
* first call and stores all results into TupleStore, and
* returns each tuple for each Iterate() call.
*/
void (*Iterate)(ForeignScanState *scanstate);

/*
* Close() will be called from ExecEndForeignScan().
* FDW should free resources for FdwReply if any.
*/
void (*Close)(ForeignScanState *scanstate);

/*
* ReOpen() will be called from ExecForeignReScan() when the
* foreign scan should be reseted to scan whole data from the
* head again.
* For instance, FDW for PostgreSQL frees current result set
* to make next Iterate() call to execute foreign query again.
*/
void (*ReOpen)(ForeignScanState *scanstate);

};

Maybe FdwRoutine should have more APIs to support essential
features.

The startup/total cost of scanning a foreign table are fixed to
10000.0 in current patch. They are groundless values, they just
say that scanning foreign table costs more than scanning local
table. The cost should be estimated from statistics in
pg_statistic and pg_class. This function has not been
implemented yet.

/*
* UpdateStats() will be called from ANALYZE routine when
* statistics of a foreign table should be updated.
*
* Some more parameters may be needed.
*
* For instance, FDW for PostgreSQL will retrieve
* pg_statistic and pg_class from foreign server and store
* them in local catalog. How to represent overhead of
* foreign query is still issue.
*/
void (*UpdateStats)(Oid relid);

To support path/plan optimization, some APIs should be added, but
we think that simple FDWs can be implemented with APIs above.

* Next action

To focus on API discussion, we are going to simplify and
re-submit patches.

* Fix some mistakes pointed out by Itagaki.
* Make FDW for PostgreSQL independent contrib module.
Currently it shares connection management codes with
contrib/dblink.
* Add simple FDW for flat file (maybe CSV ?) into core to
implement regression tests for FOREIGN TABLE DDLs and foreign
table query. In the future, the FDW for flat file can be
integrated with COPY FROM.

Any comments/questions are welcome.

Regards,
--
Shigeru Hanada

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Gurjeet Singh 2010-09-27 14:09:28 Re: Improving prep_buildtree used in VPATH builds
Previous Message Dimitri Fontaine 2010-09-27 14:03:55 Re: gist access methods parameter types