Re: block-level incremental backup

From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: block-level incremental backup
Date: 2019-04-10 18:21:45
Message-ID: 20190410202145.425d4aa6@firost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

First thank you for your answer!

On Wed, 10 Apr 2019 12:21:03 -0400
Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Wed, Apr 10, 2019 at 10:57 AM Jehan-Guillaume de Rorthais
> <jgdr(at)dalibo(dot)com> wrote:
> > My idea would be create a new tool working on archived WAL. No burden
> > server side. Basic concept is:
> >
> > * parse archives
> > * record latest relevant FPW for the incr backup
> > * write new WALs with recorded FPW and removing/rewriting duplicated
> > walrecords.
> >
> > It's just a PoC and I hadn't finished the WAL writing part...not even
> > talking about the replay part. I'm not even sure this project is a good
> > idea, but it is a good educational exercice to me in the meantime.
> >
> > Anyway, using real life OLTP production archives, my stats were:
> >
> > # WAL xlogrec kept Size WAL kept
> > 127 39% 50%
> > 383 22% 38%
> > 639 20% 29%
> >
> > Based on this stats, I expect this would save a lot of time during recovery
> > in a first step. If it get mature, it might even save a lot of archives
> > space or extend the retention period with degraded granularity. It would
> > even help taking full backups with a lower frequency.
> >
> > Any thoughts about this design would be much appreciated. I suppose this
> > should be offlist or in a new thread to avoid polluting this thread as this
> > is a slightly different subject.
>
> Interesting idea, but I don't see how it can work if you only deal
> with the FPWs and not the other records. For instance, suppose that
> you take a full backup at time T0, and then at time T1 there are two
> modifications to a certain block in quick succession. That block is
> then never touched again. Since no checkpoint intervenes between the
> modifications, the first one emits an FPI and the second does not.
> Capturing the FPI is fine as far as it goes, but unless you also do
> something with the non-FPI change, you lose that second modification.
> You could fix that by having your tool replicate the effects of WAL
> apply outside the server, but that sounds like a ton of work and a ton
> of possible bugs.

In my current design, the scan is done backward from end to start and I keep all
the records appearing after the last occurrence of their respective FPI.

The next challenge I have to achieve is to deal with multiple blocks records
where some need to be removed and other are FPI to keep (eg. UPDATE).

> I have a related idea, though. Suppose that, as Peter says upthread,
> you have a replication slot that prevents old WAL from being removed.
> You also have a background worker that is connected to that slot. It
> decodes WAL and produces summary files containing all block-references
> extracted from those WAL records and the associated LSN (or maybe some
> approximation of the LSN instead of the exact value, to allow for
> compression and combining of nearby references). Then you hold onto
> those summary files after the actual WAL is removed. Now, when
> somebody asks the server for all blocks changed since a certain LSN,
> it can use those summary files to figure out which blocks to send
> without having to read all the pages in the database. Although I
> believe that a simple system that finds modified blocks by reading
> them all is good enough for a first version of this feature and useful
> in its own right, a more efficient system will be a lot more useful,
> and something like this seems to me to be probably the best way to
> implement it.

Summary files looks like what Andrey Borodin described as delta-files and
change maps.

> With an approach based
> on WAL-scanning, the work is done in the background and nobody has to
> wait for it.

Agree with this.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-04-10 18:38:43 Re: block-level incremental backup
Previous Message Alexander Korotkov 2019-04-10 17:41:17 Re: Status of the table access method work