Re: Can we let extensions change their dumped catalog schemas?

From: Jacob Champion <jchampion(at)timescale(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Can we let extensions change their dumped catalog schemas?
Date: 2023-01-11 18:27:29
Message-ID: CAAWbhmi9pPCwo74tGw7mnVDMHZDcyZx6yDEJDx1MYokbpk8cYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 10, 2023 at 7:53 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Jacob Champion <jchampion(at)timescale(dot)com> writes:
> > Unless I'm missing something obvious (please, let it be that) there's no
> > way to do this safely. Once you've marked an internal table as dumpable,
> > its schema is effectively frozen if you want your dumps to work across
> > versions, because otherwise you'll try to restore that "catalog" data
> > into a table that has different columns. And while sometimes you can
> > make that work, it doesn't in the general case.
>
> I agree that's a problem, but it's not that we're arbitrarily prohibiting
> something that would work. What, concretely, do you think could be
> done to improve the situation?

Concretely, I think extensions should be able to invoke their update
scripts at some point after a dump/restore cycle, whether
automatically or manually.

> > 2) Provide a way to record the exact version of an extension in a dump.
>
> Don't really see how that helps?

If pg_dump recorded our extension using

CREATE EXTENSION timescaledb VERSION <original version>

then we'd be able to migrate the changed catalog post-restore, using a
standard ALTER EXTENSION ... UPDATE.

> I also fear that it will break
> a bunch of use-cases that work fine today, which are exactly the
> ones for which we originally defined pg_dump as *not* committing
> to a particular extension version.

Right, I think it would have to be opt-in. Say, a new control file
option dump_version or some such.

> It feels like what we really need here is some way to mutate the
> old format of an extension config table into the new format.

Agreed. We already provide mutation functions via the update scripts,
so I think both proposal 2 and 3 do that. I'm curious about your
opinion on option 3, since it would naively seem to make pg_dump do
_less_ work for a given extension.

> Simple addition of new columns shouldn't be a problem (in fact,
> I think that works already, or could easily be made to). If you
> want some ETL processing then it's harder :-(.

One sharp edge for the add-a-new-column case is where you give the new
column a default, and you want all of the old migrated rows to have
some non-default value to handle backwards compatibility. (But that
case is handled trivially if you _know_ that you're performing a
migration.)

> Could an ON INSERT
> trigger on an old config table transpose converted data into a
> newer config table?

You mean something like, introduce table catalog_v2, and have all
INSERTs to catalog_v1 migrate and redirect the rows? That seems like
it could work today, though it would mean maintaining two different
upgrade paths for the same table, migrating all users of the catalog
to the new name, and needing to drop the old table at... some point
after the restore? I don't know if there would be performance concerns
with larger catalogs (in fact I'm not sure how big these catalogs
get).

> Another point that ought to be made here is that pg_dump is not
> the only outside consumer of extension config data. You're likely
> to break some applications if you change a config table too much.

Such as? We don't really want applications to be coupled against our
internals by accident, but we have to dump the internals to be able to
reproduce the state of the system.

> That's not an argument that we shouldn't try to make pg_dump more
> forgiving, but I'm not sure that we need to move heaven and earth.

Agreed. Hopefully we can find something that just moves a little earth. :D

Thanks!
--Jacob

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2023-01-11 18:29:49 Re: logical decoding and replication of sequences, take 2
Previous Message Andres Freund 2023-01-11 18:27:20 Option to not use ringbuffer in VACUUM, using it in failsafe mode