Re: Hooks to track changed pages for backup purposes

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hooks to track changed pages for backup purposes
Date: 2017-09-13 05:53:53
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Tomas! Thank you for looking into that patch.

> 8 сент. 2017 г., в 1:53, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> написал(а):
> A few more comments:
> * The patch defines wal_switch_hook, but it's never called.
That call was missing, that's a bug, thanks for spotting that out.

> * I see there are conditions like this:
> if(xlogreader->blocks[nblock].forknum == MAIN_FORKNUM)
> Why is it enough to restrict the block-tracking code to main fork?
> Aren't we interested in all relation forks?
fsm, vm and others are small enough to take them

> I guess you'll have to explain
> what the implementation of the hooks is supposed to do, and why these
> locations for hook calls are the right ones. It's damn impossible to
> validate the patch without that information.
> Assuming you still plan to use the hook approach ...
Yes, I still think hooking is good idea, but you are right - I need prototype first. I'll mark patch as Returned with feedback before prototype implementation.

>>> There
>>> are no arguments fed to this hook, so modules would not be able to
>>> analyze things in this context, except shared memory and process
>>> state?
>>> Those hooks are put in hot code paths, and could impact performance of
>>> WAL insertion itself.
>> I do not think sending few bytes to cached array is comparable to disk
> write of XLog record. Checking the func ptr is even cheaper with correct
> branch prediction.
> That seems somewhat suspicious, for two reasons. Firstly, I believe we
> only insert the XLOG records into WAL buffer here, so why should there
> be any disk write related? Or do you mean the final commit?
Yes, I mean finally we will be waiting for disk. Hundred empty ptr checks are neglectable in comparision with disk.
> But more importantly, doesn't this kind of information require some
> durability guarantees? I mean, if it gets lost during server crashes or
> restarts, doesn't that mean the incremental backups might miss some
> buffers? I'd guess the hooks will have to do some sort of I/O, to
> achieve that, no?
We need durability only on the level of one segment. If we do not have info from segment we can just rescan it.
If we send segment to S3 as one file, we are sure in it's integrity. But this IO can by async.

PTRACK in it's turn switch bits in fork's buffers which are written in checkpointer and..well... recovered during recovery. By usual WAL replay of recovery.

> From this POV, the idea to collect this information on the backup system
> (WAL archive) by pre-processing the arriving WAL segments seems like the
> most promising. It moves the work to another system, the backup system
> can make it as durable as the WAL segments, etc.

Well, in some not so rare cases users encrypt backups and send to S3. And there is no system with CPUs that can handle that WAL parsing. Currently, I'm considering mocking prototype for wal-g, which works exactly this.

Your comments were very valuable, thank you for looking into the patch and joining the discussion.

Best regards, Andrey Borodin.

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-09-13 05:59:53 Re: Partition-wise join for join between (declaratively) partitioned tables
Previous Message Amit Langote 2017-09-13 05:48:12 Re: Setting pd_lower in GIN metapage