Re: review: FDW API

From: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jan Urbański <wulczer(at)wulczer(dot)org>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: FDW API
Date: 2011-01-31 13:00:55
Message-ID: 20110131220054.9B28.6989961C@metrosystems.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 24 Jan 2011 15:08:11 +0200
Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> I've gone through the code in a bit more detail now. I did a bunch of
> cosmetic changes along the way, patch attached. I also added a few
> paragraphs in the docs. We need more extensive documentation, but this
> at least marks the places where I think the docs need to go.
>
> Comments:

Thanks for the comments!

> * How can a FDW fetch only the columns required by the scan? The file
> FDW has to read the whole file anyhow, but it could perhaps skip calling
> the input function for unnecessary columns. But more importantly, with
> something like the postgresql_fdw you don't want to fetch any extra
> columns across the wire. I gather the way to do it is to copy
> RelOptInfo->attr_needed to private storage at plan stage, and fill the
> not-needed attributes with NULLs in Iterate. That gets a bit awkward,
> you need to transform attr_needed to something that can be copied for
> starters. Or is that information somehow available at execution phase
> otherwise?

I thought that RelOptInfo->reltargetlist, a list of Var, can be used
for that purpose. FdwPlan can copy it with copyObject(), and pass
it to Iterate through FdwPlan->fdw_private.

Then, postgresql_fdw would be able to retrieve only necessary columns,
or just use "NULL" for unnecessary columns in the SELECT clause to
avoid mapping values to columns. Each way would be decrease amount of
data transfer.

> * I think we need something in RelOptInfo to mark foreign tables. At the
> moment, you need to call IsForeignTable() which does a catalog lookup.
> Maybe a new RTEKind, or a boolean flag.

We can avoid catalog lookup with checking table type in get_relation_info()
and updating RelOptInfo->is_foreign_table if the target was a foreign
table.

> * Can/should we make ReScan optional? Could the executor just do
> EndScan+BeginScan if there's no ReScan function?

Right, we have enough information to call BeginScan again. Will fix.

> * Is there any point in allowing a FDW without a handler? It's totally
> useless, isn't it? We had the CREATE FOREIGN DATA WRAPPER syntax in
> previous versions, and it allowed it, but it has always been totally
> useless so I don't think we need to worry much about
> backwards-compatibility here.

dblink (and possibly other external modules) uses FDW without a
handler.

> * Is there any use case for changing the handler or validator function
> of an existign FDW with ALTER? To me it just seems like an unnecessary
> complication.

AFAICS, the only case for that is upgrading FDW to new one without
re-creating foreign tables. I don't have strong opinion for this
issue, and it seems reasonable to remove ALTER feature in first
version.

> * IMHO the "FDW-info" should always be displayed, without VERBOSE. In my
> experience with another DBMS that had this feature, the SQL being sent
> to the remote server was almost always the key piece of information that
> I was looking for in the query plans.

Agreed, will fix to show FDW-info always. Is it reasonable to show
"FDW-info" row even if a FDW set explainInfo to NULL?

> * this check in expand_inherited_rtentry seems misplaced:
>
> > /*
> > * SELECT FOR UPDATE/SHARE is not allowed on foreign tables because
> > * they are read-only.
> > */
> > if (newrelation->rd_rel->relkind == RELKIND_FOREIGN_TABLE &&
> > lockmode != AccessShareLock)
> > ereport(ERROR,
> > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > errmsg("SELECT FOR UPDATE/SHARE is not allowed with foreign tables")));
>
> I don't understand why we'd need to do that for inherited tables in
> particular. And it's not working for regular non-inherited foreign tables:
>
> postgres=# SELECT * FROM filetbl2 FOR UPDATE;
> ERROR: could not open file "base/11933/16397": No such file or directory

It's a remnants of table inheritance support for foreign tables. This
check should be removed from here, and another check should be added
to avoid above error.

> * Need to document how the FDW interacts with transaction
> commit/rollback. In particular, I believe EndScan is never called if the
> transaction is aborted. That needs to be noted explicitly, and need to
> suggest how to clean up any external resources in that case. For
> example, postgresql_fdw will need to close any open cursors or result sets.

I agree that resource cleanup is an important issue when writing FDW.
FDW should use transaction-safe resources like VirtualFile, or use
ResourceOwner callback mechanism. Is it reasonable to add new page
under "Chapter 35. Extending SQL"?

> In general, I think the design is sound. What we need is more
> documentation. It'd also be nice to see the postgresql_fdw brought back
> to shape so that it works against this latest version of the api patch.

I'll post FDW API patches which reflect comments first, and then I'll
rebase postgresql_fdw against them.

Regards,
--
Shigeru Hanada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2011-01-31 13:26:15 Re: SSI patch version 14
Previous Message Dan Ports 2011-01-31 11:50:28 Re: SSI patch version 14