Re: Tracking of page changes for backup purposes. PTRACK [POC]

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Tracking of page changes for backup purposes. PTRACK [POC]
Date: 2017-12-21 00:51:58
Message-ID: CAB7nPqTMGy28=pSEVFdTXjbRs9kceQ+3CHQqhR6Wt04kmQJeGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 21, 2017 at 7:35 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Dec 20, 2017 at 3:45 PM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>> Isn't more effective hold this info in Postgres than in backup sw?
>>> Then any backup sw can use this implementation.
>>
>> I don't think it means it can't be implemented in Postgres, but does it
>> need to be done in backend?
>>
>> For example, it might be a command-line tool similar to pg_waldump,
>> which processes WAL segments and outputs list of modified blocks,
>> possibly with the matching LSN. Or perhaps something like pg_receivewal,
>> doing that in streaming mode.
>>
>> This part of the solution can still be part of PostgreSQL codebase, and
>> the rest has to be part of backup solution anyway.
>
> I agree with all of that.

+1. This summarizes a bunch of concerns about all kinds of backend
implementations proposed. Scanning for a list of blocks modified via
streaming gives more availability, but knowing that you will need to
switch to a new segment anyway when finishing a backup, does it really
matter? Doing it once a segment has finished would be cheap enough,
and you can even do it in parallel with a range of segments.

Also, since 9.4 and the introduction of the new WAL API to track
modified blocks, you don't need to know about the record types to know
which blocks are being changed. Here is an example of tool I hacked up
in a couple of hours that does actually what you are looking for, aka
a scanner of the blocks modified per record for a given WAL segment
using xlogreader.c:
https://github.com/michaelpq/pg_plugins/tree/master/pg_wal_blocks

You could just use that and shape the data in the way you want and you
would be good to go.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-12-21 01:03:47 Re: Bitmap table scan cost per page formula
Previous Message Michael Paquier 2017-12-21 00:38:55 Re: Basebackups reported as idle