Re: trying again to get incremental backup

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: trying again to get incremental backup
Date: 2023-10-04 19:33:27
Message-ID: CA+TgmoZxPpWU11+yQCHr8obHK5nAyb96KpN25bk4PdX51dUFFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 28, 2023 at 6:22 AM Jakub Wartak
<jakub(dot)wartak(at)enterprisedb(dot)com> wrote:
> all those basic tests had GOOD results. Please find attached. I'll try
> to schedule some more realistic (in terms of workload and sizes) test
> in a couple of days + maybe have some fun with cross-backup-and
> restores across standbys.

That's awesome! Thanks for testing! This can definitely benefit from
any amount of beating on it that people wish to do. It's a complex,
delicate area that risks data loss.

> If that is still an area open for discussion: wouldn't it be better to
> just specify LSN as it would allow resyncing standby across major lag
> where the WAL to replay would be enormous? Given that we had
> primary->standby where standby would be stuck on some LSN, right now
> it would be:
> 1) calculate backup manifest of desynced 10TB standby (how? using
> which tool?) - even if possible, that means reading 10TB of data
> instead of just putting a number, isn't it?
> 2) backup primary with such incremental backup >= LSN
> 3) copy the incremental backup to standby
> 4) apply it to the impaired standby
> 5) restart the WAL replay

Hmm. I wonder if this would even be a safe procedure. I admit that I
can't quite see a problem with it, but sometimes I'm kind of dumb.

> Also maybe it's too early to ask, but wouldn't it be nice if we could
> have an future option in pg_combinebackup to avoid double writes when
> used from restore hosts (right now we need to first to reconstruct the
> original datadir from full and incremental backups on host hosting
> backups and then TRANSFER it again and on target host?). So something
> like that could work well from restorehost: pg_combinebackup
> /tmp/backup1 /tmp/incbackup2 /tmp/incbackup3 -O tar -o - | ssh
> dbserver 'tar xvf -C /path/to/restored/cluster - ' . The bad thing is
> that such a pipe prevents parallelism from day 1 and I'm afraid I do
> not have a better easy idea on how to have both at the same time in
> the long term.

I don't think it's too early to ask for this, but I do think it's too
early for you to get it. ;-)

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-10-04 19:50:59 Re: --sync-method isn't documented to take an argument
Previous Message a.rybakina 2023-10-04 19:19:59 Re: POC, WIP: OR-clause support for indexes