Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

From: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Thom Brown <thom(at)linux(dot)com>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Date: 2015-05-09 12:49:53
Message-ID: CADyhKSUo1WHMQiy7d3ddNXuU1HMqxUTqYYZQ6cO5e+qsQ_M6+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2015-05-09 11:21 GMT+09:00 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Fri, May 8, 2015 at 5:48 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> ... btw, I just noticed something that had escaped me because it seems so
>> obviously wrong that I had not even stopped to consider the possibility
>> that the code was doing what it's doing. To wit, that the planner
>> supposes that two foreign tables are potentially remote-joinable if they
>> share the same underlying FDW handler function. Not the same server, and
>> not even the same pg_foreign_data_wrapper entry, but the pg_proc entry for
>> the handler function. I think this is fundamentally bogus. Under what
>> circumstances are we not just laying off the need to check same server
>> origin onto the FDW? How is it that the urgent need for the FDW to check
>> for that isn't even mentioned in the documentation?
>>
>> I think that we'd really be better off insisting on same server (as in
>> same pg_foreign_server OID), hence automatically same FDW, and what's
>> even more important, same user mapping for any possible query execution
>> context. The possibility that there are some corner cases where some FDWs
>> could optimize other scenarios seems to me to be poor return for the bugs
>> and security holes that will arise any time typical FDWs forget to check
>> this.
>
> I originally wanted to go quite the other way with this and check for
> join pushdown via handler X any time at least one of the two relations
> involved used handler X, even if the other one used some other handler
> or was a plain table. In particular, it seems to me quite plausible
> to want to teach an FDW that a certain local table is replicated on a
> remote node, allowing a join between a foreign table and a plain table
> to be pushed down. This infrastructure can't be used that way anyhow,
> so maybe there's no harm in tightening it up, but I'm wary of
> circumscribing what FDW authors can do. I think it's better to be
> rather expansive in terms of when we call them and let them return
> without doing anything some of them time than to define the situations
> in which we call them too narrowly and end up ruling out interesting
> use cases.
>
Probably, it is relatively minor case to join a foreign table and a replicated
local relation on remote side. Even if the rough check by sameness of
foreign server-id does not invoke GetForeignJoinPaths, FDW driver can
implement its arbitrary logic using set_join_pathlist_hook by its own risk,
isn't it?

The attached patch changed the logic to check joinability of two foreign
relations. As upthread, it checks foreign server-id instead of handler
function.
build_join_rel() set fdw_server of RelOptInfo if inner and outer foreign-
relations have same value, then it eventually allows to kick
GetForeignJoinPaths on add_paths_to_joinrel().

Thanks,
--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

Attachment Content-Type Size
custom-join-fdw-pushdown-check-by-server.patch application/octet-stream 9.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kohei KaiGai 2015-05-09 12:54:08 Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Previous Message Michael Paquier 2015-05-09 12:48:03 Re: subxcnt defined as signed integer in SnapshotData and SerializeSnapshotData