Re: FDW for PostgreSQL

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Cc: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FDW for PostgreSQL
Date: 2013-02-21 10:30:27
Message-ID: 3290.1361442627@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com> writes:
> [ postgres_fdw.v5.patch ]

Applied with a lot of revisions.

There are still a number of loose ends and things that need to be
discussed:

I'm not at all happy with the planner support functions --- as designed,
that code is basically incapable of thinking about more than one remote
access path. It needs to be restructured and then extended to be able
to generate parameterized remote paths. The "local estimation" mode is
pretty bogus as well. I thought the patch could be committed anyway,
but I'm going to go back and work on that part later.

I doubt that deparseFuncExpr is doing the right thing by printing
casts as their underlying function calls. The comment claims that
this way is more robust but I rather think it's less so, because it's
effectively assuming that the remote server implements casts exactly
like the local one, which might be incorrect if the remote is a different
Postgres version. I think we should probably change that, but would like
to know what the argument was for coding it like this. Also, if this
is to be the approach to printing casts, why is RelabelType handled
differently?

As I mentioned earlier, I think it would be better to force the remote
session's search_path setting to just "pg_catalog" and then reduce the
amount of explicit schema naming in the queries --- any opinions about
that?

I took out the checks on collations of operators because I thought they
were thoroughly broken. In the first place, looking at operator names
to deduce semantics is unsafe (if we were to try to distinguish equality,
looking up btree opclass membership would be the way to do that). In the
second place, restricting only collation-sensitive operators and not
collation-sensitive functions seems just about useless for guaranteeing
safety. But we don't have any very good handle on which functions might
be safe to send despite having collatable input types, so taking that
approach would greatly restrict our ability to send function calls at all.

The bigger picture here though is that we're already relying on the user
to make sure that remote tables have column data types matching the local
definition, so why can't we say that they've got to make sure collations
match too? So I think this is largely a documentation issue and we don't
need any automated enforcement mechanism, or at least it's silly to try
to enforce this when we're not enforcing column type matching (yet?).

What might make sense is to try to determine whether a WHERE clause uses
any collations different from those of the contained foreign-column Vars,
and send it over only if not. That would prevent us from sending clauses
that explicitly use collations that might not exist on the remote server.
I didn't try to code this though.

Another thing that I find fairly suspicious in this connection is that
deparse.c isn't bothering to print collations attached to Const nodes.
That may be a good idea to avoid needing the assumption that the remote
server uses the same collation names we do, but if we're going to do it
like this, those Const collations need to be considered when deciding
if the expression is safe to send at all.

A more general idea that follows on from that is that if we're relying on
the user to be sure the semantics are the same, maybe we don't need to be
quite so tight about what we'll send over. In particular, if the user has
declared a foreign-table column of a non-built-in type, the current code
will never send any constraints for that column at all, which seems overly
conservative if you believe he's matched the type correctly. I'm not sure
exactly how to act on that thought, but I think there's room for
improvement there.

A related issue is that as coded, is_builtin() is pretty flaky, because
what's built-in on our server might not exist at all on the remote side,
if it's a different major Postgres version. So what we've got is that
the code is overly conservative about what it will send and yet still
perfectly capable of sending remote queries that will fail outright,
which is not a happy combination. I have no very good idea how to deal
with that though.

Another thing I was wondering about, but did not change, is that if we're
having the remote transaction inherit the local transaction's isolation
level, shouldn't it inherit the READ ONLY property as well?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-02-21 11:27:48 Re: posix_fadvise missing in the walsender
Previous Message Andres Freund 2013-02-21 08:54:06 Re: Materialized views WIP patch