| From: | Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com> |
|---|---|
| To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Add --extra-dependencies and immediate data dumping for pg_dump/pg_upgrade |
| Date: | 2025-12-24 11:44:18 |
| Message-ID: | CAM2+6=UstF2jQc8tZMbb3A-ag84-UhKs2OnYQn7pwwarY9i2nA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hello Hackers,
We have identified a dependency issue—most notably observed with the
PostGIS extension—where a table's column definition relies on data existing
in another table's catalog at restore time. Because pg_dump typically
separates schema and data into distinct sections, these implicit data-level
dependencies are not captured, leading to failures during pg_upgrade or
pg_restore.
Jakub Wartak previously reported a detailed example of this issue here:
https://www.postgresql.org/message-id/CAKZiRmwWyh-yGM8Hrvuuo04JiYFy8S4TLM-3Mn-zi9Rfqc744Q%40mail.gmail.com
Following a discussion with Alvaro Herrera, we have developed a patch based
on his suggestions.
The Problem
In certain extension-heavy schemas, an object's schema definition cannot be
created unless another table's data is already populated. Current pg_dump
logic handles schema-to-schema dependencies via pg_depend, but it lacks a
mechanism to:
1.
Enforce a specific order for dependencies not recorded in pg_depend.
2.
Interleave data loading with schema creation for specific tables.
Proposed Solution
This patch introduces a new option, --extra-dependencies, for pg_dump and
pg_upgrade.
*1. Dependency Hinting:* The option allows users to provide manual
dependency hints in the format table#referenced_table. Internally, pg_dump
treats these as "faked" dependencies, ensuring the referenced table is
sorted before the dependent object in the dump graph, similar to standard
pg_depend entries.
*2. Immediate Data Dumping:* To satisfy the data-level requirement, the
patch ensures that any table referenced via this option has its data dumped
immediately following its definition (utilizing the dumpTableData()
infrastructure) rather than in the general DATA section of the dump.
Use Case
While this is a specialized tool intended for unrecorded dependencies (like
those in PostGIS), it provides a necessary safety valve for migrations that
currently require manual intervention or complex workarounds.
Attached are the patches (PoC) for review. We look forward to your thoughts
and suggestions.
Regards,
--
*Jeevan Chalke*
*Principal Engineer, Engineering Manager*
*Product Development*
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-pg_dump-Add-extra-dependencies-option.patch | application/octet-stream | 13.0 KB |
| v1-0002-pg_upgrade-Add-extra-dependencies-option.patch | application/octet-stream | 5.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Matthias van de Meent | 2025-12-24 11:54:48 | Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM? |
| Previous Message | sunil s | 2025-12-24 11:42:36 | Re: Avoid corrupting DefElem nodes when parsing publication_names and publish options |