Re: WIP Patch: pg_dump structured

From: Attila Soki <pgsql(at)attilasoki(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: WIP Patch: pg_dump structured
Date: 2023-03-12 21:56:06
Message-ID: 374850DA-783D-4616-BE03-3CCAA4AB7FB2@attilasoki.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 12 Mar 2023, at 21:50, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Attila Soki <pgsql(at)attilasoki(dot)com> writes:
>> This patch adds the structured output format to pg_dump.
>> This format is a plaintext output split up into multiple files and the
>> resulting small files are stored in a directory path based on the dumped object.
>
> Won't this fail completely with SQL objects whose names aren't suitable
> to be pathname components? "A/B" is a perfectly good name so far as
> SQL is concerned. You could also have problems with collisions on
> case-insensitive filesystems.

The “A/B” case is handled in _CleanFilename function, the slash and other
problematic characters are replaced.
You are right about the case-insensivity, this is not handled and will fail. I forgot
to handle that. I trying to find a way to handle this.

>
>> This format can be restored by feeding its plaintext toc file (restore-dump.sql)
>> to psql. The output is also suitable for manipulating the files with standard
>> editing tools.
>
> This seems a little contradictory: if you want to edit the individual
> files, you'd have to also update restore-dump.sql, or else it's pointless.
> It might make more sense to consider this as a write-only dump format
> and not worry about whether it can be restored directly.

The main motivation was to track changes with VCS at the file (object) level,
editing small files was intended as a second possible use case.
I did not know that a write-only format would go.

>
>> What do you think of this feature, any chance it will be added to pg_dump once
>> the patch is ready?
>
> I'm not clear on how big the use-case is. It's not really obvious to
> me that this'd have any benefit over the existing plain-text dump
> capability. You can edit those files too, at least till the schema
> gets too big for your editor. (But if you've got many many thousand
> SQL objects, a file-per-SQL-object directory will also be no fun to
> deal with.)

I use something like this (a previous version) to track several thousand
objects. But I'm not sure if that would have a wide user base.
Therefore the wip to see if there is interest in this feature.
I think the advantage of having many small files is that it is recognizable
which file (object) is involved in a commit and that the SQL functions and
tables get a change history.

Thank you for your feedback.

Regards,
Attila Soki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2023-03-12 22:06:03 Re: Progress report of CREATE INDEX for nested partitioned tables
Previous Message Peter Smith 2023-03-12 21:14:03 Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher