Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Thom Brown <thom(at)linux(dot)com>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Date: 2015-05-10 14:44:21
Message-ID: CA+TgmoaP+spf0L8w6=XZbks1LrRa4f6o0xJmDPVpqTPDxNxwEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, May 9, 2015 at 1:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I originally wanted to go quite the other way with this and check for
>> join pushdown via handler X any time at least one of the two relations
>> involved used handler X, even if the other one used some other handler
>> or was a plain table. In particular, it seems to me quite plausible
>> to want to teach an FDW that a certain local table is replicated on a
>> remote node, allowing a join between a foreign table and a plain table
>> to be pushed down.
>
> If we did do something like that, I think a saner implementation would
> involve substituting a foreign table for the local one earlier, via view
> expansion. So by the time we are doing join planning, there would be no
> need to consider cross-server joins anyway.

Huh? You can't do this at rewrite time; it's very fundamentally a
planning problem. Suppose the user wants to join A, B, and C, where A
is a plain table, B is a plain table that is replicated on a foreign
server, and C is a foreign table on that same foreign server. If we
decide to join B to C first, we probably want to push down the join,
although maybe not, if we estimate that B JOIN C will have more rows
than C. If we decide to join A to B first, we want to use the local
copy of B.

>> This infrastructure can't be used that way anyhow,
>> so maybe there's no harm in tightening it up, but I'm wary of
>> circumscribing what FDW authors can do.
>
> Somebody who's really intent on shooting themselves in the foot can always
> use the set_join_pathlist_hook to inject paths for arbitrary joins.
> The FDW mechanism should support reasonable use cases without undue
> complication, and I doubt that what we've got now is adding anything
> except complication and risk of bugs.
>
> For the archives' sake, let me lay out a couple of reasons why an FDW
> that tried to allow cross-server joins would almost certainly be broken,
> and broken in security-relevant ways. Suppose for instance that
> postgres_fdw tried to be smart and drill down into foreign tables' server
> IDs to allow joining of any two tables that have the same effective host
> name, port, database name, user name, and anything else you think would be
> relevant to its choice of connections. The trouble with that is that the
> user mapping is context dependent, in particular one local userID might
> map to the same remote user name for two different server OIDs, while
> another might map to different user names. So if we plan a query under
> the first userID we might decide it's okay to do the join remotely.
> Then, if we re-use that plan while executing as another userID (which is
> entirely possible) what will probably happen is that the remote join
> query will get sent off under one or the other of the remote usernames
> associated with the second local userID. This could lead to either a
> permission failure, or a remote table access that should not be allowed
> to the current local userID. Admittedly, such cases might be rare in
> practice, but it's still a security hole. Also, even if the FDW is
> defensive enough to recheck full matching of the tables' connection
> properties at execution time, there's not much it could do about the
> situation except fail; it couldn't cause a re-plan to occur.
>
> For another case, we do not have any mechanism whereby operations like
> ALTER SERVER OPTIONS could invalidate existing plans. Thus, even if
> the two tables' connection properties matched at plan time, there's no
> guarantee that they still match at execution. This is probably not a
> security hole (at least not if you assume foreign-server owners are
> trusted users), but it's still a bug that exists only if someone tries
> to allow cross-server joins.
>
> For these reasons, I think that if an FDW tried to be laxer than "tables
> must be on the same pg_foreign_server entry to be joined", the odds
> approach unity that it would be broken, and probably dangerously broken.
> So we should just make that check for the FDWs. Anybody who thinks
> they're smarter than the average bear can use set_join_pathlist_hook,
> but they are probably wrong.

Drilling down into postgres_fdw's connection properties seems pretty
silly; the user isn't likely to create two SERVER objects that are
identical and then choose which one to use at random, and if they do,
they deserve what they get. The universe of FDWs, however, is
potentially bigger than that. What does a server even mean for
file_fdw, for example? I can't think of any reason why somebody would
want to implement joins inside file_fdw, but if they did, all the
things being joined would be local files, so the server ID doesn't
really matter. Now you may say that's a silly use case, but it's less
obviously silly if the files contain structured data, as with
cstore_fdw, yet the server ID could still be not especially relevant.
Maybe you've got servers representing filesystem directories; that
shouldn't preclude cross "server" joins.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Emre Hasegeli 2015-05-10 14:57:12 Re: BRIN range operator class
Previous Message Robert Haas 2015-05-10 14:30:43 Re: multixacts woes