A different approach to extension NO USER DATA feature

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: A different approach to extension NO USER DATA feature
Date: 2011-02-06 18:23:35
Message-ID: 9163.1297016615@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

As I work through the extensions patch, the aspect of it that I like the
least is the NO USER DATA clause and related functions. I think it's
badly designed, badly implemented, and doesn't solve the problem.
If we accept it as-is, we'll soon have to rework it, and at that point
we'll be left with a large backwards-compatibility issue.

To recap: the problem we need to solve is the possibility that an
extension has configuration tables whose contents might get altered
within a particular database; if so, we need to have pg_dump dump out
those alterations so that the configuration will be preserved. This
isn't just an academic issue; contrib/tsearch2 for example had tables
exactly like that, before it got absorbed into core, and I believe
PostGIS has such today.

The solution offered by the submitted patch looks like this:

1. The CREATE EXTENSION command has an optional clause NO USER DATA,
which users are not supposed to include when initially loading an
extension, but which pg_dump will always include when emitting CREATE
EXTENSION in dump scripts.

2. There is a function pg_extension_with_user_data() that an extension's
SQL script can call to detect whether NO USER DATA was specified.

3. There is a function pg_extension_flag_dump(oid) that an extension's
SQL script can call to mark a created object as (sort of) not part of
the extension after all, so that pg_dump will dump that object anyway.

The way that you'd use these features to solve the problem is to make
the extension's SQL script flag the configuration table for dumping, and
also avoid creating the table or loading any predefined entries into it
if NO USER DATA is active. (Which seems to mean that that option is
named backwards, but I digress.)

IMO this approach has got numerous serious problems:

1. Because the flag operates at the whole-SQL-object level, the only
way to have a configuration table is to exclude the whole table
definition from being treated as part of the extension. This makes
it impossible to upgrade the table definition (eg, add columns or
indexes) when installing a new version of the extension. This also
makes it really easy to break the extension during selective restores
... there's no assurance the table will get created at all.

2. Also, because we're working at the whole-SQL-object level, there's
no good way to deal with the possibility that the configuration table
contains some rows created by the extension itself and others added by
the user. All such rows will get dumped and reloaded separately from
the extension, meaning there's no way to upgrade the extension-provided
initial data either.

3. Use of pg_extension_with_user_data() requires the extension SQL
script to contain conditional logic, which is not easy unless you want
to assume plpgsql is available. Which extensions really should not do.
(Once again: plpgsql may be installed by default, but that doesn't mean
it's guaranteed to be present.)

4. If the SQL script uses pg_extension_with_user_data() to decide not to
create the table, then it has no way to apply pg_extension_flag_dump,
with the result that after a dump and reload cycle, the configuration
table doesn't appear to be related to the extension at all. Thus the
user has another way to break the extension, which is to accidentally
drop the configuration table.

So basically I think this approach doesn't work, can't be made to
work, and imposes unreasonable burdens on extension authors anyway.
If we're going to support user-modifiable configuration tables,
we need to do it a different way.

After a bit of thought I believe that we can fix this if we are willing
to teach pg_dump explicitly about extension configuration tables.
The behavior we want for those is for the table schema definition to
never be dumped (the table should always be created by CREATE EXTENSION),
but for some subset of the table data to get dumped, excluding any
system-provided rows. An extension that wants to make use of that
ability would probably need to add a boolean column "system_data" or
something similar to its configuration tables. Having done so,
the extension's SQL script could call a function that identifies the
configuration table and tells how to decide which rows to dump,
along the lines of

pg_extension_partial_dump (table_name regclass, where_condition text)

for example,

SELECT pg_extension_partial_dump('my_table', 'WHERE NOT system_data');

I'm envisioning that this function would add that information to the
pg_extension entry, and then pg_dump would use it to select out a subset
of rows that it would dump as user data, even though it's not dumping
the table definition.

One interesting point is that for this to work during pg_upgrade,
we'd need to include those user-data rows in the upgrade dump, otherwise
they'd get lost. So at least for pg_upgrade, those rows need to be
considered schema not data. I'm not sure if we should handle them that
way all the time or whether pg_dump should special-case this for binary
upgrades.

I've not attempted to implement this idea, but offhand it looks like
it would take a day or two's work, which seems well worthwhile to
expend now in the hope of getting this feature right the first time.

Comments?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2011-02-06 18:30:02 Re: SSI patch version 14
Previous Message Stephen Frost 2011-02-06 18:19:38 Re: REVIEW: Determining client_encoding from client locale