Re: Tracking of page changes for backup purposes. PTRACK [POC]

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Tracking of page changes for backup purposes. PTRACK [POC]
Date: 2017-12-27 09:37:37
Message-ID: 1D5FDB76-D7AC-46BB-B684-50C90E0E7BBE@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

> 21 дек. 2017 г., в 5:51, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> написал(а):
>
> On Thu, Dec 21, 2017 at 7:35 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Dec 20, 2017 at 3:45 PM, Tomas Vondra
>> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>>> Isn't more effective hold this info in Postgres than in backup sw?
>>>> Then any backup sw can use this implementation
>> [Skipped]
>> I agree with all of that.
>
> +1. This summarizes a bunch of concerns about all kinds of backend
> implementations proposed. Scanning for a list of blocks modified via
> streaming gives more availability, but knowing that you will need to
> switch to a new segment anyway when finishing a backup, does it really
> matter? Doing it once a segment has finished would be cheap enough,
> and you can even do it in parallel with a range of segments.
>
> Also, since 9.4 and the introduction of the new WAL API to track
> modified blocks, you don't need to know about the record types to know
> which blocks are being changed. Here is an example of tool I hacked up
> in a couple of hours that does actually what you are looking for, aka
> a scanner of the blocks modified per record for a given WAL segment
> using xlogreader.c:
> https://github.com/michaelpq/pg_plugins/tree/master/pg_wal_blocks
>
> You could just use that and shape the data in the way you want and you
> would be good to go.

Michael, that's almost what I want. I've even filed GSoC proposal for this [0].
But can we have something like this in Postgres?
The tool I'm hacking is in Go, I cannot just embed bunch of Postgres C into it. That is why API, like PTRACK, suits my needs better. Not because it uses any superior mechanics, but because it is API, ready for external 3rd party backup software (having backup software in Pg would be even better).

Anastasia, I've implemented PTRACK support in WAL-G.
First, there are few minor issues with patch:
1. There is malformed comment
2. Function pg_ptrack_version() is absent

Then, I think that API is far from perfect: pg_ptrack_get_and_clear() changes global ptrack_clear_lsn, which introduces some weakness (for paranoids). May be use something like "pg_ptrack_get_and_clear(oid,oid,previous_lsn)" which will fail if previous_lsn do not match? Also, function pg_ptrack_get_and_clear() do not return errors when there is no table with this oid. Finally, I had to interpret any empty map as absence of map. From my POV, function must fail on errors like: invaid oid passed, no table found, no PTRACK map exists, et c.
I use external file-tracking mechanics, so function pg_ptrack_init_get_and_clear() was of no use for me.

Last, but most important for me: my tests showed lost page updates. Probably, it is bug or paranoia in my test software. But may I ask you to check this [1] code, which converts PTRACK map to number of block numbers. Do I get meaning of PTRACK map right? Thank you very much.

[0] https://wiki.postgresql.org/index.php?title=GSoC_2018#WAL-G_delta_backups_with_WAL_scanning_.282018.29
[1] https://github.com/wal-g/wal-g/blob/ptrack/pagefile.go#L167-L173

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message james 2017-12-27 10:05:52 Re: Postgres with pthread
Previous Message Arthur Zakirov 2017-12-27 09:20:00 Re: [PROPOSAL] Shared Ispell dictionaries