FDW-based dblink

From: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Subject: FDW-based dblink
Date: 2009-08-13 10:35:34
Message-ID: 20090813184143.7714.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here is a proposal to integrate contrib/dblink and SQL/MED (foreign
data wrapper).

Dblink manages connections and transactions by itself at the moment,
but there are some benefits to split database connectors into FDW.
Dblink will uses those multiple connectors. For example, we will be
able to retrieve data from Oracle into PostgreSQL directly if we had
Oracle-connector.

New syntax in SQL
-------------------
CREATE FOREIGN DATA WRAPPER postgres
CONNECTOR pg_catalog.dblink_postgres;
or
CREATE FOREIGN DATA WRAPPER postgres
OPTIONS (connector 'pg_catalog.dblink_postgres')

We don't have to modify gram.y if we take the latter syntax, but need
to modify VALIDATORs to distinguish 'connector' and other options.
The 'connector' option should not be passed as connection string.

New interface in C
--------------------
pg_catalog.dblink_postgres is a function that havs folloing prototype:

Connection *connector(List *defElems);

The argument 'defElems' is a concatenated connection options
in FDW, server, and user-mapping.

Also new two interfaces will be introduced:

interface Connection /* represents PGconn */
{
void disconnect(self);
Cursor *open(self, query, fetchsize); /* for SELECT */
int64 exec(self, query); /* for UPDATE, INSERT, DELETE */
bool transaction_command(self, type);
}

interface Cursor /* represents PGresult and server-side cursor */
{
bool fetch(self, OUT values);
void close(self);
}

They have some methods implemented with function pointers. The benefit
of using function pointers is that we only have to export one connector
function to pg_proc.

The Cursor interface represents both result-set and server-side cursor.
PostgreSQL has SQL-level cursor, but there are some database that have
protocol-level cursor. This abstraction layer is needed for dblink to
handle connectors to other databases.

Other features
----------------
Present dblink is a thin wrapper of libpq, but some of my customers
want automatic transaction managements. Remote transactions are
committed with 2PC when the local transaction is committed.
To achieve it, I think we need on-commit trigger is needed,
but it is hard to implement with current infrastructure.
(That is one of the reason I proposed to merge dblink into core.)

Other considerations
----------------------
The proposed method is a SQL-based connector. There might be another
approach -- ScanKey-based connector. It is similar to the index access
method interface (pg_am). It takes relation id and scankeys instead of
SQL text. The scanKey-based approach will work better if we try to pass
WHERE-clause to an external database. However, I think we need SQL-based
interface in any case. ScanKey will be converted to SQL and passed
to an external database.

I have a prototype of the feature. I'd like to submit it for 8.5.
Comments welcome.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2009-08-13 10:47:50 trigger functions can only be called as triggers
Previous Message Pierre Frédéric Caillaud 2009-08-13 10:01:53 Re: COPY speedup