Re: Hooks to track changed pages for backup purposes

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hooks to track changed pages for backup purposes
Date: 2017-10-02 10:06:28
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> On 13 Sep 2017, at 15:01, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> On 09/13/2017 07:53 AM, Andrey Borodin wrote:
>>> * I see there are conditions like this:
>>> if(xlogreader->blocks[nblock].forknum == MAIN_FORKNUM)
>>> Why is it enough to restrict the block-tracking code to main fork?
>>> Aren't we interested in all relation forks?
>> fsm, vm and others are small enough to take them
> That seems like an optimization specific to your backup solution, not
> necessarily to others and/or to other possible use cases.
>>> I guess you'll have to explain
>>> what the implementation of the hooks is supposed to do, and why these
>>> locations for hook calls are the right ones. It's damn impossible to
>>> validate the patch without that information.
>>> Assuming you still plan to use the hook approach ...
>> Yes, I still think hooking is good idea, but you are right - I need
>> prototype first. I'll mark patch as Returned with feedback before
>> prototype implementation.
> OK
>>>>> There
>>>>> are no arguments fed to this hook, so modules would not be able to
>>>>> analyze things in this context, except shared memory and process
>>>>> state?
>>>>> Those hooks are put in hot code paths, and could impact performance of
>>>>> WAL insertion itself.
>>>> I do not think sending few bytes to cached array is comparable to disk
>>> write of XLog record. Checking the func ptr is even cheaper with correct
>>> branch prediction.
>>> That seems somewhat suspicious, for two reasons. Firstly, I believe we
>>> only insert the XLOG records into WAL buffer here, so why should there
>>> be any disk write related? Or do you mean the final commit?
>> Yes, I mean finally we will be waiting for disk. Hundred empty ptr
>> checks are neglectable in comparision with disk.
> Aren't we doing these calls while holding XLog locks? IIRC there was
> quite a significant performance improvement after Heikki reduced the
> amount of code executed while holding the locks.
>>> But more importantly, doesn't this kind of information require some
>>> durability guarantees? I mean, if it gets lost during server crashes or
>>> restarts, doesn't that mean the incremental backups might miss some
>>> buffers? I'd guess the hooks will have to do some sort of I/O, to
>>> achieve that, no?
>> We need durability only on the level of one segment. If we do not have
>> info from segment we can just rescan it.
>> If we send segment to S3 as one file, we are sure in it's integrity. But
>> this IO can by async.
>> PTRACK in it's turn switch bits in fork's buffers which are written in
>> checkpointer and..well... recovered during recovery. By usual WAL replay
>> of recovery.
> But how do you do that from the hooks, if they only store the data into
> a buffer in memory? Let's say you insert ~8MB of WAL into a segment, and
> then the system crashes and reboots. How do you know you have incomplete
> information from the WAL segment?
> Although, that's probably what wal_switch_hook() might do - sync the
> data whenever the WAL segment is switched. Right?
>>> From this POV, the idea to collect this information on the backup system
>>> (WAL archive) by pre-processing the arriving WAL segments seems like the
>>> most promising. It moves the work to another system, the backup system
>>> can make it as durable as the WAL segments, etc.
>> Well, in some not so rare cases users encrypt backups and send to S3.
>> And there is no system with CPUs that can handle that WAL parsing.
>> Currently, I'm considering mocking prototype for wal-g, which works
>> exactly this.
> Why couldn't there be a system with enough CPU power? Sure, if you want
> to do this, you'll need a more powerful system, but regular CPUs can do
>> 1GB/s in AES-256-GCM thanks to AES-NI. Or you could do it on the
> database as part of archive_command, before the encryption, of course.

Based on unanswered questions in the discussion in this thread, and that no new
version of the patch has been submitted, I’m marking this returned with
feedback. Please re-submit the patch in a future commitfest when ready for new

cheers ./daniel

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2017-10-02 10:13:18 Re: Transactions involving multiple postgres foreign servers
Previous Message Daniel Gustafsson 2017-10-02 09:58:53 Re: Making clausesel.c Smarter