Tracking of page changes for backup purposes. PTRACK [POC]

From: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Tracking of page changes for backup purposes. PTRACK [POC]
Date: 2017-12-18 10:18:48
Message-ID: 429c92fd-dd2d-54e0-a41d-3673a0726f57@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

In this thread I would like to raise the issue of incremental backups.
What I suggest in this thread, is to choose one direction, so we can
concentrate our community efforts.
There is already a number of tools, which provide incremental backup.
And we can see five principle techniques they use:

1. Use file modification time as a marker that the file has changed.
2. Compute file checksums and compare them.
3. LSN-based mechanisms. Backup pages with LSN >= last backup LSN.
4. Scan all WAL files in the archive since the previous backup and
collect information about changed pages.
5. Track page changes on the fly. (ptrack)

They can also be combined to achieve better performance.

My personal candidate is the last one, since it provides page-level
granularity, while most of the others approaches can only do file-level
incremental backups or require additional reads or calculations.

In a nutshell, using ptrack patch, PostgreSQL can track page changes on
the fly. Each time a relation page is updated, this page is marked in a
special PTRACK bitmap fork for this relation. As one page requires just
one bit in the PTRACK fork, such bitmaps are quite small. Tracking
implies some minor overhead on the database server operation but speeds
up incremental backups significantly.

Detailed overview of the implementation with all pros and cons,
patches and links to the related threads you can find here:

https://wiki.postgresql.org/index.php?title=PTRACK_incremental_backups.

Patches for v 10.1 and v 9.6 are attached.
Since ptrack is basically just an API for use in backup tools, it is
impossible to test the patch independently.
Now it is integrated with our backup utility, called pg_probackup. You can
find it herehttps://github.com/postgrespro/pg_probackup
Let me know if you find the documentation too long and complicated, I'll
write a brief How-to for ptrack backups.

Spoiler: Please consider this patch and README as a proof of concept. It
can be improved in some ways, but in its current state PTRACK is a
stable prototype, reviewed and tested well enough to find many
non-trivial corner cases and subtle problems. And any discussion of
change track algorithm must be aware of them. Feel free to share your
concerns and point out any shortcomings of the idea or the implementation.

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
ptrack_10.1_v1.4.patch text/x-patch 87.3 KB
ptrack_9.6.6_v1.4.patch text/x-patch 87.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2017-12-18 10:25:25 Re: Small typo in comment in json_agg_transfn
Previous Message Aleksander Alekseev 2017-12-18 09:53:58 Re: GSoC 2018