Re: [PATCH] pg_dump: Restore extension config table data before user objects during pg_upgrade

From: Jimmy Angelakos <jimmy(dot)angelakos(at)pgedge(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] pg_dump: Restore extension config table data before user objects during pg_upgrade
Date: 2026-04-06 19:33:19
Message-ID: CAA0-ca3T3LqZvnDU3v_haJS78BO0GEGED_AN0Dc9Eshau7vhMA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Andrew,

Thanks for your review!

My opinion is that this is a bugfix rather than a feature: Rather than
adding new capability it's fixing pg_upgrade's behaviour, because it
currently fails in the described scenario (SRID-constrained columns). The
new code path isn't user facing and it only fires during pg_upgrade's
internal use of pg_dump --binary-upgrade.

However, I will defer to the committers' judgement on whether this should
be included in PG19 and backpatched.

To address your feedback, please find attached v2 which:
1. Removes dumpExtensionData() and adds the handling for EXTENSION DATA
object type to dumpTableData()
2. Adds test in test_pg_dump: we insert a row into the dumpable extension
table, and we expect that the COPY appears in --binary-upgrade dumps.

I have also added a commitfest entry.

Thanks again,
Jimmy

On Sun, Mar 29, 2026 at 7:34 PM Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:

>
> On 2026-03-20 Fr 1:47 PM, Jimmy Angelakos wrote:
> > Hi All,
> >
> > I ran into this issue when pg_upgrade-ing a DB with PostGIS. This is
> > my first code patch, so any feedback on the approach will be appreciated!
> >
> > The problem:
> > ============
> > pg_upgrade uses pg_dump --schema-only --binary-upgrade to copy the
> > schema from $oldcluster to $newcluster. Because this excludes all
> > table data, it leaves out data in extension config tables registered
> > with pg_extension_config_dump().
> >
> > In $newcluster, binary_upgrade_create_empty_extension() creates the
> > extensions without populating any table data. The extensions' CREATE
> > EXTENSION scripts never get executed so any INSERTs are skipped. As a
> > consequence, if any CREATE TABLE statement in $newcluster requires
> > validation against these empty config tables, the upgrade fails. As an
> > example,
> > PostGIS registers config table spatial_ref_sys to hold ~8500 spatial
> > reference system definitions (SRIDs). When a table has, e.g. a
> > geometry column that specifies an SRID, this gets validated during the
> > CREATE TABLE:
> >
> > CREATE TABLE points (id int, location geometry(Point, 27700));
> > ERROR: Cannot find SRID (27700) in spatial_ref_sys
> >
> > This will happen for any SRID-constrained column, which will prevent
> > many real-world PostGIS deployments from being able to pg_upgrade. To
> > summarise the problem, our ordering is wrong here because extension
> > configuration data must be present before user tables that depend on
> > it get created, but --schema-only strips this data.
> >
> > The patch:
> > ==========
> > We are adding a new dump object type DO_EXTENSION_DATA that dumps
> > extension config table data in SECTION_PRE_DATA during
> > --binary-upgrade ONLY. This restores the needed data between extension
> > creation and user object creation, allowing the DDL to succeed.
> >
> > Four files are modified in bin/pg_dump:
> >
> > pg_dump.h:
> > Add DO_EXTENSION_DATA to the DumpableObjectType enum, between
> > DO_EXTENSION and DO_TYPE
> >
> > pg_dump_sort.c:
> > Add PRIO_EXTENSION_DATA between PRIO_EXTENSION and PRIO_TYPE
> >
> > pg_dump.c:
> > 1. Add makeExtensionDataInfo() to create a TableDataInfo with objType
> > = DO_EXTENSION_DATA. Called for plain tables (RELKIND_RELATION) during
> > --binary-upgrade ONLY. As it depends on the table def, the COPY will
> > be emitted after the CREATE TABLE.
> > 2. Add dumpExtensionData() to emit the entry in SECTION_PRE_DATA with
> > description "EXTENSION DATA" using dumpTableData_copy(). This allows
> > the config table data to go into the schema-only dump.
> > 3. In processExtensionTables(), when dopt->binary_upgrade is true,
> > call makeExtensionDataInfo() instead of makeTableDataInfo().
> > Additionally, skip extcondition filter because we need to dump all
> > rows here.
> > 4. Include DO_EXTENSION_DATA in pre-data boundary in
> > addBoundaryDependencies()
> >
> > pg_backup_archiver.c:
> > Add "EXTENSION DATA" to the whitelist in _tocEntryRequired() similar
> > to BLOB, BLOB METADATA, etc. to include extension config table data in
> > --schema-only dumps during --binary-upgrade ONLY.
> >
> > What ends up happening:
> > =======================
> > The inserted rows are basically scaffolding to allow the upgrade, and
> > do not persist. The pg_upgrade sequence goes like:
> > 1. pg_dump includes $oldcluster extension config data in schema-only dump
> > 2. pg_restore replays the dump into $newcluster and "EXTENSION DATA"
> > entries populate tables like spatial_ref_sys with COPY. Subsequent
> > CREATE TABLEs with e.g. SRID-constrained columns pass validation.
> > 3. pg_upgrade transfers all data files from $oldcluster to
> > $newcluster, making spatial_ref_sys byte-for-byte identical to its
> > previous state.
> >
> > This patch:
> > 1. Does NOT affect normal pg_dumps (without --binary-upgrade).
> > DO_EXTENSION_DATA objects are not created in this case.
> > 2. Leaves binary_upgrade_create_empty_extension() unchanged.
> > 3. Is not PostGIS-specific, and should solve this class of problem for
> > any extension that registers config tables that will be needed for DDL
> > validation.
> > 4. Has been tested against HEAD at 29bf4ee7496 with $oldcluster
> > PostGIS 3.3.9 on PG14 and $newcluster PostGIS 3.7.0dev/master on
> > PG19-devel.
> >
> > Thanks in advance for your review! Please find attached the patch for
> > HEAD. I believe this should be easily backpatchable to (at least)
> > PG15, and will be happy to work on backports.
>
>
> Hi, Jimmy.
>
> First, as you probably know, we don't backpatch features, and I think
> this comes into that category. Unfortunately, we're about to close
> release 19 for features, so this would need to wait till release 20.
>
> The patch didn't include any tests. It will need them (probably in
> src/test/modules/test_pg_dump)
>
> There appears to be a lot of code duplication between
> dumpExtensionData() and dumpTableData(). It might be better to refactor
> that, perhaps by supplying an extra flag to dumpTableData().
>
> Do make sure to add a Commitfest entry for this is you haven't already
> done so.
>
>
> cheers
>
>
> andrew
>
>
>
>
>
>
>
> >
> > Best regards,
> > Jimmy
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>

--

Jimmy Angelakos

Staff Software Engineer

jimmy(at)pgEdge(dot)com

pgEdge.com <http://pgedge.com/>

Attachment Content-Type Size
v2-0001-pg_dump-Restore-extension-config-table-data-befor.patch text/x-patch 10.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2026-04-06 19:35:00 Re: Add custom EXPLAIN options support to auto_explain
Previous Message Robert Haas 2026-04-06 19:30:49 Re: Add custom EXPLAIN options support to auto_explain