Re: Hooks to track changed pages for backup purposes

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hooks to track changed pages for backup purposes
Date: 2017-09-13 13:01:55
Message-ID: bb489d5f-4f4e-c161-26c2-18685d720c56@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/13/2017 07:53 AM, Andrey Borodin wrote:
>> * I see there are conditions like this:
>>
>>    if(xlogreader->blocks[nblock].forknum == MAIN_FORKNUM)
>>
>> Why is it enough to restrict the block-tracking code to main fork?
>> Aren't we interested in all relation forks?
> fsm, vm and others are small enough to take them 
>

That seems like an optimization specific to your backup solution, not
necessarily to others and/or to other possible use cases.

>> I guess you'll have to explain
>> what the implementation of the hooks is supposed to do, and why these
>> locations for hook calls are the right ones. It's damn impossible to
>> validate the patch without that information.
>>
>> Assuming you still plan to use the hook approach ...
> Yes, I still think hooking is good idea, but you are right - I need
> prototype first. I'll mark patch as Returned with feedback before
> prototype implementation.
>

OK

>>
>>>> There
>>>> are no arguments fed to this hook, so modules would not be able to
>>>> analyze things in this context, except shared memory and process
>>>> state?
>>>
>>>>
>>>> Those hooks are put in hot code paths, and could impact performance of
>>>> WAL insertion itself.
>>> I do not think sending few bytes to cached array is comparable to disk
>> write of XLog record. Checking the func ptr is even cheaper with correct
>> branch prediction.
>>>
>>
>> That seems somewhat suspicious, for two reasons. Firstly, I believe we
>> only insert the XLOG records into WAL buffer here, so why should there
>> be any disk write related? Or do you mean the final commit?
> Yes, I mean finally we will be waiting for disk. Hundred empty ptr
> checks are neglectable in comparision with disk.

Aren't we doing these calls while holding XLog locks? IIRC there was
quite a significant performance improvement after Heikki reduced the
amount of code executed while holding the locks.

>>
>> But more importantly, doesn't this kind of information require some
>> durability guarantees? I mean, if it gets lost during server crashes or
>> restarts, doesn't that mean the incremental backups might miss some
>> buffers? I'd guess the hooks will have to do some sort of I/O, to
>> achieve that, no?
> We need durability only on the level of one segment. If we do not have
> info from segment we can just rescan it.
> If we send segment to S3 as one file, we are sure in it's integrity. But
> this IO can by async.
>
> PTRACK in it's turn switch bits in fork's buffers which are written in
> checkpointer and..well... recovered during recovery. By usual WAL replay
> of recovery.
>

But how do you do that from the hooks, if they only store the data into
a buffer in memory? Let's say you insert ~8MB of WAL into a segment, and
then the system crashes and reboots. How do you know you have incomplete
information from the WAL segment?

Although, that's probably what wal_switch_hook() might do - sync the
data whenever the WAL segment is switched. Right?

>
>> From this POV, the idea to collect this information on the backup system
>> (WAL archive) by pre-processing the arriving WAL segments seems like the
>> most promising. It moves the work to another system, the backup system
>> can make it as durable as the WAL segments, etc.
>
> Well, in some not so rare cases users encrypt backups and send to S3.
> And there is no system with CPUs that can handle that WAL parsing.
> Currently, I'm considering mocking prototype for wal-g, which works
> exactly this.
>

Why couldn't there be a system with enough CPU power? Sure, if you want
to do this, you'll need a more powerful system, but regular CPUs can do
>1GB/s in AES-256-GCM thanks to AES-NI. Or you could do it on the
database as part of archive_command, before the encryption, of course.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2017-09-13 13:26:27 Re: Clarification in pg10's pgupgrade.html step 10 (upgrading standby servers)
Previous Message Peter Moser 2017-09-13 12:57:47 Re: [PROPOSAL] Temporal query processing with range types