Re: [v9.3] writable foreign tables

From: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, "Alexander Korotkov *EXTERN*" <aekorotkov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Subject: Re: [v9.3] writable foreign tables
Date: 2012-11-19 17:07:30
Message-ID: CADyhKSXdG7jPP_umGZjv1SD_Und9Xmn+6wCy=LphFKoHi=VW0g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2012/11/19 Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>:
> Kohei KaiGai wrote:
>>> I am not so happy with GetForeignRelInfo:
>>> - The name seems ill-chosen from the FDW API side.
>>> I guess that you chose the name because the function
>>> is called from get_relation_info, but I think the name
>>> should be more descriptive for the FDW API.
>>> Something like PlanForeignRelRowid.
>>
>> Indeed, GetForeignRelInfo might give misleading impression
>> as if this routine collects widespread information about
>> target relation. So, how about GetForeignRelWidth() instead?
>
> That would be better for the function as it is now.
>
>>> - I guess that every FDW that only needs "rowid" will
>>> do exactly the same as your fileGetForeignRelInfo.
>>> Why can't that be done in core?
>>> The function could pass an AttrNumber for the rowid
>>> to the FDW, and will receive a boolean return code
>>> depending on whether the FDW plans to use rowid or not.
>>> That would be more convenient for FDW authors.
>>
>> This design tries to kill two-birds with one-stone.
>> It enables to add multiple number of pseudo-columns,
>> not only "rowid", and makes possible to push-down
>> complex calculation of target list into external computing
>> resource.
>>
>> For example, when user gives the following query:
>>
>> SELECT ((c1 - c2) * (c2 - c3))^2 FROM ftable
>>
>> it contains a complex calculation in the target-list,
>> thus, it also takes CPU cycles of local process.
>>
>> If we can replace the "((c1 - c2) * (c2 - c3))^2" by
>> a reference to a pseudo-column that also references
>> the calculation result on external node, it effectively
>> off-load CPU cycles.
>>
>> In this case, all we need to do is (1) acquire a slot
>> for pseudo-column at GetForeignRelInfo (2) replace
>> TargetEntry::expr by Var node that reference this
>> pseudo-column.
>>
>> It makes sense for performance optimization, so I don't
>> want to restrict this handler for "rowid" only.
>
> I understand.
>
> But I think that you still can do that with the change that
> I suggest. I suggest that GetForeignRelInfo (or whatever the
> name ends up being) gets the AttrNumber of the proposed "rowid"
> column in addition to the parameters you need for what
> you want to do.
>
> Then nothing would keep you from defining those
> pseudo-columns. But all the setup necessary for the "rowid"
> column could be moved out of the FDW. So for the 99% of all
> FDW which are only interested in the rowid, things would
> get much simpler and they don't all have to implement the
> same code.
>
> Did I make clear what I mean?
> Would that be difficult?
>
All we have to do at get_relation_info() to deal with pseudo-
columns (including alternatives of complex calculation, not
only "rowid") is just expansion of rel->max_attr.
So, if FDW is not interested in something except for "rowid",
it can just inform the caller "Yeah, we need just one slot for
a pseudo-column of rowid". Otherwise, it can return another
value to acquire the slot for arbitrary pseudo-column.
I don't think it is a problematic design.

However, I'm skeptical 99% of FDWs don't support target-list
push-down. At least, it was very desired feature when I had
a talk at PGconf.EU last month. :-)

So, if we rename it to GetForeignRelWidth, is it defined as
follows?

extern AttrNumber
GetForeignRelWidth(PlannerInfo *root,
RelOptInfo *baserel,
Oid foreigntableid,
bool inhparent,
List *targetList);

Right now, inhparent makes no sense because foreign table
does not support table inheritance, but it seems to me we
shall have this functionality near future.

>>> - I guess the order is dictated by planner steps, but
>>> it would be "nice to have" if GetForeignRelInfo were
>>> not the first function to be called during planning.
>>> That would make it easier for a FDW to support both
>>> 9.2 and 9.3 (fewer #ifdefs), because the FDW plan state
>>> will probably have to be created in the first API
>>> function.
>>
>> The baserel->fdw_private should be initialized to NULL,
>> so it can perform as a mark whether private data is already
>> constructed, or not.
>
> Right, if that pointer is pre-initialized to NULL, that
> should work. Forget my quibble.
>
>> In addition, I noticed my patch didn't update documentation stuff.
>> I also add mention about new handlers.
>
> I didn't get into documentation, comment and spelling issues since
> the patch was still called POC, but yes, eventually that would
> be necessary.
>
> Yours,
> Laurenz Albe

Thanks,
--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2012-11-19 17:17:36 Re: [RFC] Fix div/mul crash and more undefined behavior
Previous Message Dimitri Fontaine 2012-11-19 17:05:26 Re: Dumping an Extension's Script