Re: In-order pg_dump (or in-order COPY TO)

From: Dimitrios Apostolou <jimis(at)gmx(dot)net>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: In-order pg_dump (or in-order COPY TO)
Date: 2025-08-27 16:10:58
Message-ID: 4ss66r31-558o-qq24-332q-no351p7n5osr@tzk.arg
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-general


On Wednesday 2025-08-27 17:25, Adrian Klaver wrote:

> Comments in line below.
>
>> Dump is from PostgreSQL 16, it's pg_dump writing to stdout:
>>
>> pg_dump -v --format=custom --compress=none --no-toast-compression --
>> serializable-deferrable db_name  |  borg create ...
>>
>>
>> As you can see the backup (and deduplicating) program is borgbackup.
>
> Ok, I use BorgBackup and it is fairly forgiving of normal changes.
>
> FYI, if you ever want to use compression check out gzip --rsyncable, I have
> found it plays well with Borg. For more information see:
>
> https://beeznest.wordpress.com/2005/02/03/rsyncable-gzip/

Yes, zstd has also --rsyncable. In this case I let borg do per-chunk
compression after deduplication, it has worked well so far.

>> Restore is in PostgreSQL 17:
>>
>> I first create the empty tables by running the DDL commands in version
>> control to setup the database. And then I do pg_restore --data-only:
>>
>> pg_restore -vvvv -j 8 -U db_owner -d db_name --schema=public --
>> section=data  dump_file
>
> If you are using only the --data section why not --data-only in the pg_dump?

I want the dump to be as complete as possible. Didn't think it would
create issues.

>
> Or is the pg_dump output used for other purposes?

It has happened that I have selectively restored user schemas from that
dump.

>> Worth noting is that the above pg_restore goes through the WAL, i.e. all
>> writes are done by walwriter, not the backend directly.
>
> Please explain the above further.

The COPY FROM data is going through the WAL, as usual INSERTS do. The
writes to disk happen by the walwriter process.

OTOH, If you have configured the server with wal_level=minimal and
BEGIN a transaction, CREATE or TRUNCATE a table, and then COPY FROM into
that table, then the backend process writes directly to the table
without logging to the WAL.

This can be much faster, but most importantly it avoids situations of
WAL overflow that are very difficult to predict and can mess your server
up completely. [1]

[1] https://www.postgresql.org/message-id/flat/076464ad-3d70-dd25-9e8f-e84f27decfba%40gmx.net

My patches are for activating that codepath in pg_restore, but they were
not used on purpose and I took notice that the writes went via WAL.

>
> The problem occurs when you do the pg_dump after this restore, correct?

Correct. The first pg_dump from the restored pg17 is not deduplicated at
all. Most of the tables have not changed (logically at least; apparently
they have changed physically).

>
> Is it the same pg_dump command as you show above?

Yes.

>
>>
>> Postgres is standard open source running on own server. It has a couple of
>> custom patches that shouldn't matter in this codepath.
>
> For completeness and just in case they may affect the output what do the
> patches do?

Two patches for speeding up scanning an archive without TOC, like the
one I'm having (because it is piped into borg, instead of written to
file). These were activated, but shouldn't matter. They only build the
TOC in pg_restore's memory.

https://commitfest.postgresql.org/patch/5809/
https://commitfest.postgresql.org/patch/5817/

And two patches for speeding up pg_restore like mentioned above, under
specific arguments that I didn't provide. (one speedup needs --clean,
and the other needs --freeze).

https://commitfest.postgresql.org/patch/5821/
https://commitfest.postgresql.org/patch/5826/

IIRC I did not activate them (via --clean) because TRUNCATE fails when
foreign keys exist. See the discussion threads.

Dimitris

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Daniel Gustafsson 2025-08-27 19:59:18 Re: How to configure client-side TLS ciphers for streaming replication?
Previous Message Ron Johnson 2025-08-27 15:27:51 psql --html and to_char()