Re: In-order pg_dump (or in-order COPY TO)

From: Dimitrios Apostolou <jimis(at)gmx(dot)net>
To: Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Greg Sabino Mullane <htamfids(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: In-order pg_dump (or in-order COPY TO)
Date: 2025-09-04 16:08:12
Message-ID: sn171540-722q-7o58-9153-541o63rq2s56@tzk.arg
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Álvaro and Greg,

On Thursday 2025-09-04 14:02, Álvaro Herrera wrote:

> It's generally considered nowadays that pg_dump is not the best option
> to create backups of very large databases. You may be better served by
> using a binary backup tool -- something like Barman. With current
> Postgres releases you can create incremental backups, which would
> probably be more effective at deduplicating than playing with pg_dump's
> TOC, because it's based on what actually happens to the data. Barman
> provides support for hook scripts, which perhaps can be used to transfer
> the backup files to Borg. (I haven't actually tried to do this, but the
> Barman developers talk about using them to transfer the backups to tape,
> so I imagine getting them to play with Borg it's a Simple Matter of
> Programming.)

On Wed, 27 Aug 2025, Greg Sabino Mullane wrote:

> I suggest looking into pgBackRest, and it's block incremental feature,
> which sounds similar to what you are doing. But it also does it with
> parallel processes, and can do things like grab backup files from your
> replicas, plus a lot of other features.

if I'm not mistaken, both Barman and pgBackRest are based on physical
dumps of the database (pg_basebackup). At the start of this project I
had evaluated pg_basebackup, but decided logical backup fitted my needs
better.

+ pg_basebackup was slower, measuring speeds of around 10MB/s, because
of issues with 8KB page size and compressed btrfs (see [1]; situation
has been improved both on the postgres side and the kernel side;
I'm not sure how pg_basebackup fares today).

+ pg_basebackup was much bigger, because of including indices etc. As a
result of size and speed, pg_basebackup was also taking a longer time.

+ physical dumps would change a lot during maintenance (vacuum full,
cluster etc) while the data would remain the same. This would
reduce the effect of deduplication and increase size requirements even
further. At that point in time I did not expect logical dumps to
change too, when the data hasn't changed.

+ I use logical dumps as a tool, not only as a backup, to copy the
database to other servers with different postgresql versions.

+ I also use it to verify the VCS-committed SQL schema: doing pg_restore
--data-only on an already created database will fail if the SQL schema
had been modified on the original server without committing the
changes.

+ Finally I don't really need all the advanced features that physical
replication offers, like HA, PITR, load balancing. It's a
non-mission-critical service that can take a little time off in case
of disaster recovery.

[1] https://www.postgresql.org/message-id/flat/218fa2e0-bc58-e469-35dd-c5cb35906064%40gmx.net

Regards,
Dimitris

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message veem v 2025-09-04 16:57:39 Debugging query performance in postgres
Previous Message Adrian Klaver 2025-09-04 15:54:07 Re: Q: limit the length of log file entries?