Re: [v9.3] writable foreign tables

From: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, "Alexander Korotkov *EXTERN*" <aekorotkov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Subject: Re: [v9.3] writable foreign tables
Date: 2012-11-20 14:05:21
Message-ID: CADyhKSV68j_4SiQpM0wW4w+Vf64iBofEj1oD-autSuesHx6DQg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2012/11/20 Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>:
> Kohei KaiGai wrote:
>>>> This design tries to kill two-birds with one-stone.
>>>> It enables to add multiple number of pseudo-columns,
>>>> not only "rowid", and makes possible to push-down
>>>> complex calculation of target list into external computing
>>>> resource.
>>>>
>>>> For example, when user gives the following query:
>>>>
>>>> SELECT ((c1 - c2) * (c2 - c3))^2 FROM ftable
>>>>
>>>> it contains a complex calculation in the target-list,
>>>> thus, it also takes CPU cycles of local process.
>>>>
>>>> If we can replace the "((c1 - c2) * (c2 - c3))^2" by
>>>> a reference to a pseudo-column that also references
>>>> the calculation result on external node, it effectively
>>>> off-load CPU cycles.
>>>>
>>>> In this case, all we need to do is (1) acquire a slot
>>>> for pseudo-column at GetForeignRelInfo (2) replace
>>>> TargetEntry::expr by Var node that reference this
>>>> pseudo-column.
>>>>
>>>> It makes sense for performance optimization, so I don't
>>>> want to restrict this handler for "rowid" only.
>
>>> I understand.
>>>
>>> But I think that you still can do that with the change that
>>> I suggest. I suggest that GetForeignRelInfo (or whatever the
>>> name ends up being) gets the AttrNumber of the proposed "rowid"
>>> column in addition to the parameters you need for what
>>> you want to do.
>>>
>>> Then nothing would keep you from defining those
>>> pseudo-columns. But all the setup necessary for the "rowid"
>>> column could be moved out of the FDW. So for the 99% of all
>>> FDW which are only interested in the rowid, things would
>>> get much simpler and they don't all have to implement the
>>> same code.
>
>> All we have to do at get_relation_info() to deal with pseudo-
>> columns (including alternatives of complex calculation, not
>> only "rowid") is just expansion of rel->max_attr.
>> So, if FDW is not interested in something except for "rowid",
>> it can just inform the caller "Yeah, we need just one slot for
>> a pseudo-column of rowid". Otherwise, it can return another
>> value to acquire the slot for arbitrary pseudo-column.
>> I don't think it is a problematic design.
>>
>> However, I'm skeptical 99% of FDWs don't support target-list
>> push-down. At least, it was very desired feature when I had
>> a talk at PGconf.EU last month. :-)
>
> I agree that PostgreSQL should make this technique possible.
>
> My idea should not make this any more difficult.
>
>> So, if we rename it to GetForeignRelWidth, is it defined as
>> follows?
>>
>> extern AttrNumber
>> GetForeignRelWidth(PlannerInfo *root,
>> RelOptInfo *baserel,
>> Oid foreigntableid,
>> bool inhparent,
>> List *targetList);
>>
>> Right now, inhparent makes no sense because foreign table
>> does not support table inheritance, but it seems to me we
>> shall have this functionality near future.
>
> I am thinking of this declaration:
>
> extern bool
> GetForeignRelWidth(PlannerInfo *root,
> RelOptInfo *baserel,
> Oid foreigntableid,
> bool inhparent,
> List *targetList,
> AttrNumber rowidAttr);
>
> Let me illustrate my idea with some code.
>
> Here's your fileGetForeignRelInfo:
>
>
> static void
> fileGetForeignRelInfo(PlannerInfo *root,
> RelOptInfo *baserel,
> Oid foreigntableid,
> bool inhparent, List *targetList)
> {
> FileFdwPlanState *fdw_private;
> ListCell *cell;
>
> fdw_private = (FileFdwPlanState *)
> palloc0(sizeof(FileFdwPlanState));
>
> foreach(cell, targetList)
> {
> TargetEntry *tle = lfirst(cell);
>
> if (tle->resjunk &&
> tle->resname && strcmp(tle->resname, "rowid")==0)
> {
> Bitmapset *temp = NULL;
> AttrNumber anum_rowid;
> DefElem *defel;
>
> pull_varattnos((Node *)tle, baserel->relid, &temp);
> anum_rowid = bms_singleton_member(temp)
> + FirstLowInvalidHeapAttributeNumber;
> /* adjust attr_needed of baserel */
> if (anum_rowid > baserel->max_attr)
> baserel->max_attr = anum_rowid;
> defel = makeDefElem("anum_rowid",
> (Node *)makeInteger(anum_rowid));
> fdw_private->anum_rowid = defel;
> }
> }
> baserel->fdw_private = fdw_private;
> }
>
>
> I hope that this can be reduced to:
>
>
> static bool
> fileGetForeignRelRowid(PlannerInfo *root,
> RelOptInfo *baserel,
> Oid foreigntableid,
> bool inhparent,
> List *targetList,
> AttrNumber rowidAttr)
> {
> FileFdwPlanState *fdw_private;
> fdw_private = (FileFdwPlanState *)
> palloc0(sizeof(FileFdwPlanState));
>
> defel = makeDefElem("anum_rowid",
> (Node *)makeInteger(rowidAttr));
> fdw_private->anum_rowid = defel;
>
> baserel->fdw_private = fdw_private;
>
> return true; /* we'll use rowid, so please extend baserel->max_attr
> */
> }
>
>
> That wouldn't mean that the FDW cannot define any other
> pseudo-columns in this function, just the case of rowid
> would be simplified.
>
> Does that make any sense?
>
I think "bool" is not suitable data type to inform how many pseudo-columns
are needed for this scan on foreign-table.

Probably, it is helpful to provide a helper function that fetches an attribute-
number of pseudo "rowid" column from the supplied targetlist.
If we have GetPseudoRowidColumn() at foreign/foreign.c, the avove
routine can be rewritten as:

static AttrNumber
fileGetForeignRelWidth(PlannerInfo *root,
RelOptInfo *baserel,
Relation foreignrel,
bool inhparent, List *targetList)
{
FileFdwPlanState *fdw_private;
AttrNumber nattrs = RelationGetNumberOfAttributes(foreignrel);
AttrNumber anum_rowid;

fdw_private = palloc0(sizeof(FileFdwPlanState));
anum_rowid = GetPseudoRowidColumn(..., targetList);
if (anum_rowid > 0)
{
Assert(anum_rowid > nattrs);
fdw_private->anum_rowid
= makeDefElem("anum_rowid", (Node *)makeInteger(anum_rowid));
nattrs = anum_rowid;
}
baserel->fdw_private = fdw_private;

return nattrs;
}

In case when FDW drive wants to push-down other target entry into foreign-
side, thus, it needs multiple pseudo-columns, it is decision of the extension.
In addition, it does not take API change in the future, if some more additional
pseudo-column is required by some other new features.

How about your opinion?

Thanks,
--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Przemek Lisowski 2012-11-20 14:07:53 C-function, don't load external dll file
Previous Message Alvaro Herrera 2012-11-20 13:52:52 Re: [v9.3] OAT_POST_ALTER object access hooks