Re: per-column generic option

From: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: per-column generic option
Date: 2011-06-28 12:06:40
Message-ID: 4E09C3D0.9050802@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sorry for the long delay...

(2011/06/17 21:59), David Fetter wrote:
> On Fri, Jun 17, 2011 at 07:19:39PM +0900, Shigeru Hanada wrote:
>> (2011/06/17 8:44), David Fetter wrote:
>>> Sorry not to respond sooner.
>>>
>>> First, the per-column generic options are a great thing for us to
>>> have. :)
>>
>> Thanks for the comments. :-)
>>
>>> I have an idea I've been using for the next release of DBI-Link that
>>> has varying levels of data type mapping. In general, these mappings
>>> would be units of executable code, one in-bound, and one out-bound,
>>> for each of:
>>>
>>> Universe (everything, default "mapping" is the identity map, i.e. a no-op)
>>> Database type (e.g. MySQL)
>>> Instance (e.g. mysql://foo.bar.com:5432)
>>> Database
>>> Schema
>>> Table
>>> Column
>>
>> Some of them seem to be able to be mapped to FDW object, e.g. Database
>> to SERVER and Table to FOREIGN TABLE.
>
> Yes, I see there are a few missing. "Universe" doesn't really need
> much of anything, as far as I can tell, except if we wanted to do
> something that affected SQL/MED globally. Is that hierarchy otherwise
> OK?

Yes, maybe some levels in your hierarchy can be mapped to SQL/MED
objects, and you can store options with them.

Universe : N/A (I'm not sure but custom GUC might suit for this)
Database type : FOREIGN DATA WRAPPER
Instance : N/A
Database : SERVER
Schema : N/A
Table : FOREIGN TABLE
Column : column of FOREIGN TABLE(WIP)

> DB2 may have one more level between Instance and Database Type,
> that latter being the province of an individual FDW.

I'm not familiar with DB2, but it would be difficult to map such level
to one of existing SQL/MED object types.

>>> I didn't include row in the hierarchy because I couldn't think of a
>>> way to identify rows across DBMSs and stable over time.
>>>
>>> The finest-grain transformation that's been set would be the one
>>> actually used.

Yeah, I think it's generally convenient for users if a FDW allows to
override settings which were defined for object on upper level.
For instance, if I'm dealing many files which have same format, and if
we could set "format" option for file_fdw on the server, all I have to
do for each foreign table is to specify "filename". I think that that's
usual use case.

>>> Here's an example of a non-trivial mapping.
>>>
>>> Database type:
>>> MySQL
>>> Foreign data type:
>>> datetime
>>> PostgreSQL data type:
>>> timestamptz
>>> Transformation direction:
>>> Import
>>> Transformation:
>>> CASE
>>> WHEN DATA = '0000-00-00 00:00:00'
>>> THEN NULL
>>> ELSE DATA
>>> END
>>>
>>> Here, I'm making the simplifying assumption that there is a bijective
>>> mapping between data types.
>>>
>>> Is there some way to fit the per-column part of such a mapping into
>>> this scheme? We'd need to do some dependency tracking in order to be
>>> able to point to the appropriate code...
>>
>> IIUC, you are talking about using FDW options as storage of data
>> type mapping setting, or mapping definition itself, right? If so, a
>> foreign table needs to be created to use per-column FDW options.
>> Does it suit to your idea?
>
> Yes. The only mildly disturbing thing about how that would work is
> that "magic" key names would actually point to executable code, so
> there would be some kind of non-uniform processing of the options, and
> (possibly quite unlikely) ways to escalate privilege.

How are you planning to define a mapping for an object other than
column? ISTM that you need to combine N mappings for such object, N is
the number of distinct types used under the level, so FDW seems to have
to cover all kind of transformation. Maybe you need to retrieve options
from lower level to upper level, until you find one which is suitable
for the combination of types.

>> BTW, I couldn't get what you mean by "dependency tracking". You
>> mean the dependency between foreign column and local column? It
>> might include essence of your idea... Would you explain the detail?
>
> I think the dependency between the mapping between the foreign column
> and the local one is already handled. On that subject, it's possible
> to make an argument that this mapping might need to be expanded so
> that in general, M foreign columns map to N local ones (distinct M and
> N), but that's a research topic, so let's not worry about it now.
>
> The dependency tracking I have in mind is of the actual executable
> code. If the inbound mapping has what amounts to a pointer to a
> function, it shouldn't be possible to drop that function without
> CASCADE, and if we're caching such functions, the cache needs to be
> refreshed any time the function changes.

Agreed, such dependency would have to be maintained by the system.
Dependencies from column to FDW (through foreign table and server) have
been managed with pg_depend. Cache invalidation would be need to be
implemented by dbi-link.

Current dependency graph about SQL/MED objects is:

column -> foreign table ----> server -> FDW
user mapping _/

VALIDATOR function might be able to be used to maintain pg_depend
entries when options are set/changed/dropped via CREATE/ALTER, though
it's not main purpose of VALIDATOR.

regards,
--
Shigeru Hanada

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2011-06-28 12:56:21 Re: pgsql: Branch refs/heads/REL9_1_STABLE was removed
Previous Message Kohei KaiGai 2011-06-28 08:17:51 Re: [v9.2] DROP Reworks Part.1 - Consolidate routines to handle DropStmt