Re: finding changed blocks using WAL scanning

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: finding changed blocks using WAL scanning
Date: 2019-04-18 21:47:56
Message-ID: 20190418214756.7slx55exonivfnbe@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 18, 2019 at 04:25:24PM -0400, Robert Haas wrote:
> On Thu, Apr 18, 2019 at 3:51 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > How would you choose the STARTLSN/ENDLSN? If you could do it per
> > checkpoint, rather than per-WAL, I think that would be great.
>
> I thought of that too. It seems appealing, because you probably only
> really care whether a particular block was modified between one
> checkpoint and the next, not exactly when during that interval it was
> modified. However, the simple algorithm of "just stop when you get to
> a checkpoint record" does not work, because the checkpoint record
> itself points back to a much earlier LSN, and I think that it's that
> earlier LSN that is interesting. So if you want to make this work you
> have to be more clever, and I'm not sure I'm clever enough.

OK, so let's back up and study how this will be used. Someone wanting
to make a useful incremental backup will need the changed blocks from
the time of the start of the base backup. It is fine if they
incrementally back up some blocks modified _before_ the base backup, but
they need all blocks after, until some marker. They will obviously
still use WAL to recover to a point after the incremental backup, so
there is no need to get every modifified block up to current, just up to
some cut-off point where WAL can be discarded.

I can see a 1GB marker being used for that. It would prevent an
incremental backup from being done until the first 1G modblock files was
written, since until then there is no record of modified blocks, but
that seems fine. A 1G marker would allow for consistent behavior
independent of server restarts and base backups.

How would the modblock file record all the modified blocks across
restarts and crashes? I assume that 1G of WAL would not be available
for scanning. I suppose that writing a modblock file to some PGDATA
location when WAL is removed would work since during a crash the
modblock file could be updated with the contents of the existing pg_wal
files.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-04-18 21:50:31 Re: pg_dump is broken for partition tablespaces
Previous Message Stephen Frost 2019-04-18 21:17:02 Re: block-level incremental backup