Re: Adding hook in BufferSync for backup purposes

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, stalkerg(at)gmail(dot)com, root(at)simply(dot)name
Subject: Re: Adding hook in BufferSync for backup purposes
Date: 2017-08-29 05:16:59
Message-ID: 449A7A9D-DB58-40F8-B80E-4C4EE7DB47FD@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers!

> 8 авг. 2017 г., в 11:27, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> написал(а):
>
> My point is not to claim that we mustn't put a hook there. It's that what
> such a hook could safely do is tightly constrained, and you've not offered
> clear evidence that there's something useful to be done within those
> constraints. Alvaro seems to think that the result might be better
> reached by hooking in at other places, and my gut reaction is similar.
>

I was studying through work done by Marco and Gabriel on the matter of tracking pages for the incremental backup, and I have found PTRACK patch by Yury Zhuravlev and PostgresPro [0]. This work seems to be solid and thoughtful. I have drafted a new prototype for hooks enabling incremental backup as extension based on PTRACK calls.

If you look at the original patch you can see that attached patch differs slightly: functionality to track end of critical section is omitted. I have not included it in this draft because it changes very sensitive place even for those who do not need it.

Subscriber of proposed hook must remember that it is invoked under critical section. But there cannot me more than XLR_MAX_BLOCK_ID blocks for one transaction. Proposed draft does not add any functionality to track finished transactions or any atomic unit of work, just provides a flow of changed block numbers. Neither does this draft provide any assumption on where to store this information. I suppose subscribers could trigger asynchronous writes somewhere as long as info for given segment is accumulated (do we need a hook on segment end then?). During inremental backup you can skip scanning those WAL segments for which you have accumulated changeset of block numbers.

The thing which is not clear to me: if we are accumulating blocknumbers under critical section, then we must place them to preallocated array. When is the best time to take away these blocknumbers to empty that array and avoid overflow? PTRACK has array of XLR_MAX_BLOCK_ID length and saves these array during the end of each critical section. But I want to avoid intervention into critical sections.

Thank you for your attention, any thoughts will be appreciated.

Best regards, Andrey Borodin.

[0] https://gist.github.com/stalkerg/ab833d94e2f64df241f1835651e06e4b

Attachment Content-Type Size
0001-hooks-to-watch-for-changed-pages.patch application/octet-stream 2.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mithun Cy 2017-08-29 05:57:57 Re: POC: Cache data in GetSnapshotData()
Previous Message Dilip Kumar 2017-08-29 05:08:23 Re: Proposal: Improve bitmap costing for lossy pages