Re: WIP Patch: pg_dump structured

From: Attila Soki <pgsql(at)attilasoki(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: WIP Patch: pg_dump structured
Date: 2023-03-23 15:34:05
Message-ID: 2F0C38DB-A665-4D50-9CEF-1395993C7610@attilasoki.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12 Mar 2023, at 22:56, Attila Soki <pgsql(at)attilasoki(dot)com> wrote:
>> On 12 Mar 2023, at 21:50, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>> Won't this fail completely with SQL objects whose names aren't suitable
>> to be pathname components? "A/B" is a perfectly good name so far as
>> SQL is concerned. You could also have problems with collisions on
>> case-insensitive filesystems.
>

> You are right about the case-insensivity, this is not handled and will fail. I forgot
> to handle that. I trying to find a way to handle this.

Hi Tom,

Thank you for your feedback.

This is an updated version of the pg_dump structured wip patch (V2) with the
following changes:
- to avoid path collisions on case insensitive filessystems, all path components
created from user input are suffixed with the hex representation of a 32 bit
hash. “A/B” and “a/b” will get different suffixes.
- all path components are now filesystem safe

All this is a proposal, if you know a better solution please let me know.

This patch is a WIP (V2). The patch is against master and it compiles
successfully on macOS 13.2.1 aarch64 and on Debian 11 arm64.
To test, execute pg_dump --format=structured --file=/path/to/outputdir dbname

>
>>
>>> This format can be restored by feeding its plaintext toc file (restore-dump.sql)
>>> to psql. The output is also suitable for manipulating the files with standard
>>> editing tools.
>>
>> This seems a little contradictory: if you want to edit the individual
>> files, you'd have to also update restore-dump.sql, or else it's pointless.
>> It might make more sense to consider this as a write-only dump format
>> and not worry about whether it can be restored directly.
>
> The main motivation was to track changes with VCS at the file (object) level,
> editing small files was intended as a second possible use case.
> I did not know that a write-only format would go.

Declaring this format as a write-only dump would allow a more flexible directory
structure since we wouldn't have to maintain the restore order.

>
>>
>>> What do you think of this feature, any chance it will be added to pg_dump once
>>> the patch is ready?
>>
>> I'm not clear on how big the use-case is. It's not really obvious to
>> me that this'd have any benefit over the existing plain-text dump
>> capability. You can edit those files too, at least till the schema
>> gets too big for your editor. (But if you've got many many thousand
>> SQL objects, a file-per-SQL-object directory will also be no fun to
>> deal with.)

Here is a sample use case to demonstrate how this format could be used to track
schema changes with git. The main difference from using the existing plain-text
schema dump is, that this format makes it possible to keep a history of the
actual changes made to the individual objects. For example, to determine which
migrations have changed the foo function.

# import the schema into the repository
cd /path/to/my_app_code
pg_dump --format=structured --schema-only --file=foo_schema foodb
git add foo_schema --all
git commit foo_schema -m'initial commit foo_schema'

# make changes in the db
(my_app migrate foodb)
(psql foodb < tweak.sql)

# get a fresh dump
rm -rf foo_schema
pg_dump --format=structured --schema-only --file=foo_schema foodb

# now inspect the changes under foo_schema: there may be changed, new and
# missing files
git status foo_schema

# commit all schema changes
git add foo_schema -u
git commit foo_schema -m'changes from migration foodb'

# later, inspect changes
git log --stat

# show the history of one object
git log -p -- "foo_schema/path/to/FUNCTIONS/foo.sql"

Sure, the user base for this is narrow.

Thanks for any feedback.


Best regards
Attila Soki

Attachment Content-Type Size
v2-wip-pg_dump_structured.patch application/octet-stream 31.1 KB
unknown_filename text/plain 3 bytes

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-03-23 15:41:52 Re: HOT chain validation in verify_heapam()
Previous Message Peter Eisentraut 2023-03-23 15:33:13 Re: Schema variables - new implementation for Postgres 15