| From: | Brad Arndt <brad(dot)arndt(at)gmail(dot)com> |
|---|---|
| To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Cc: | Lance Hoover <lance(at)mydatamove(dot)com>, Brad Arndt <brad(dot)arndt(at)gmail(dot)com> |
| Subject: | Feature Proposal: schema renaming in pg_dump/pg_restore |
| Date: | 2025-11-30 14:45:37 |
| Message-ID: | CAE4qKNK1HjMTBAWG5TJU=YHySOD05S4bq7zu1-9Fg30hjt26zQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
First time going through the contribution process with postgres, but from
what I read in the documentation, the best place to start with a new
feature proposal is an email to this mailing list for feedback. So here
goes...
*Use Case:*
We use pg_dump/pg_restore both as a mechanism to "archive"/"restore" data
as well as a mechanism to move data between databases. As part of both of
these operations, one thing we frequently run into is the need to rename
the schema as part of the process (either to fit a naming convention, or
due to naming conflicts).
For a concrete example, suppose we have the following:
- database_1
- schema_1
- schema_2
- database_2
- schema_1
And we wanted to copy database_1.schema_1 to database_2.schema_2.
*What we have tried so far (and the drawbacks)*
1. Using temporary schemas to avoid collisions.
In our example above, this looks like the following:
a. Clone database_1.schema_1 to database_1.temp_schema
b. dump/restore database_1.temp_schema to database_2.temp_schema
c. rename database_2.temp_schema to database_2.schema_2
d. delete database_1.temp_schema
The obvious downside to this approach is the need for the temp schemas.
Especially on database_1 where the data must be duplicated temporarily.
2. "Post-processing" the output of pg_dump to do renaming
Specifically:
a. pg_dump database_1.schema_1 using SQL/text output
b. piping that output to a process which uses regular expressions to
replace schema_1 with schema_2
c. piping that output into pgsql for restore
While we were able to make this workflow work, it has a couple of drawbacks:
1. regular expression replacement can be brittle to maintain as pg_dump
output changes
2. because we needed access to the text sql statements, we were limited to
sql/text output from pg_dump and pgsql import which precluded us from
taking advantage of many of the performance optimizations the other formats
of pg_dump/pg_restore provide
*My Proposal:*
Build in the ability to rename schemas as part of pg_dump (and to a lesser
extent pg_restore - more on that below) by adding a --rename-schema flag.
The flag could support:
- rename all schemas to a single replacement: --rename-schema <replacement
name>
- rename a specific schema: --rename-schema <src schema>:<replacement name>
- or a combination (multiple flag occurrences): --rename-schema
<schema1>:<schema1 replacement> --rename-schema <replacement for all other
schemas>
For example, the following commands would all rename schema_1 to schema_2
in the dump output:
pg_dump ... --dbname database_1 --schema schema_1 --rename-schema
schema_1:schema_2
pg_dump ... --dbname database_1 --schema schema_1 --rename-schema schema_2
pg_dump ... --dbname database_1 --schema schema_1 --rename-schema *:schema_2
*Secondary Feature: pg_restore replacement*
Renaming as part of the dump works great when we know the ultimate restore
name at the time of dumping. However, there are times we may want to dump
a db to cold storage, and then restore it at a later time to an arbitary
schema. Renaming on the restore is a little less precise than on dump as in
most cases we are just working with the raw sql statements created by the
dump. So for this use case, I'm proposing a more simplistic text
find/replace. Because of the unstructured nature, it would be on the user
to ensure that their replacement string is unique. For example (again,
using the above scenario):
1. dump database_1.schema_1 and use the dump rename feature to rename
schema1 to some_very_unique_schema in the dump file
2. restore at a later time doing a simple text replacement:
pg_restore ... --replace some_very_unique_schema:schema_1
*What I have done so far / Next Steps:*
I have a working proof of concept built on top of postgres 16.8 (what we
currently have in production).
If the community feels this is a reasonable feature to add, I would like to
go through the process of building on top of the release branch and
submitting the patch. However, I wanted to get feedback on whether this is
a reasonable feature to pursue first.
Open to any and all feedback. Thanks!!
-Brad
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Mihail Nikalayeu | 2025-11-30 16:52:00 | Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY |
| Previous Message | Andrey Borodin | 2025-11-30 12:15:42 | Re: IPC/MultixactCreation on the Standby server |