Re: block-level incremental backup

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: block-level incremental backup
Date: 2019-04-10 14:22:38
Message-ID: 1148d018-ff98-3857-20b8-45179c0742a3@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09.04.2019 18:48, Robert Haas wrote:
> 1. There should be a way to tell pg_basebackup to request from the
> server only those blocks where LSN >= threshold_value.

Some times ago I have implemented alternative version of ptrack utility
(not one used in pg_probackup)
which detects updated block at file level. It is very simple and may be
it can be sometimes integrated in master.
I attached patch to vanilla to this mail.
Right now it contains just two GUCs:

ptrack_map_size: Size of ptrack map (number of elements) used for
incremental backup: 0 disabled.
ptrack_block_log: Logarithm of ptrack block size (amount of pages)

and one function:

pg_ptrack_get_changeset(startlsn pg_lsn) returns
{relid,relfilenode,reltablespace,forknum,blocknum,segsize,updlsn,path}

Idea is very simple: it creates hash map of fixed size (ptrack_map_size)
and stores LSN of written pages in this map.
As far as postgres default page size seems to be too small  for ptrack
block (requiring too large hash map or increasing number of conflicts,
as well as
increasing number of random reads) it is possible to configure ptrack
block to consists of multiple pages (power of 2).

This patch is using memory mapping mechanism. Unfortunately there is no
portable wrapper for it in Postgres, so I have to provide own
implementations for Unix/Windows. Certainly it is not good and should be
rewritten.

How to use?

1. Define ptrack_map_size in postgres.conf, for example (use simple
number for more uniform hashing):

ptrack_map_size = 1000003

2.  Remember current lsn.

psql postgres -c "select pg_current_wal_lsn()"
 pg_current_wal_lsn
--------------------
 0/224A268
(1 row)

3. Do some updates.

$ pgbench -T 10 postgres

4. Select changed blocks.

 select * from pg_ptrack_get_changeset('0/224A268');
 relid | relfilenode | reltablespace | forknum | blocknum | segsize | 
updlsn   |         path
-------+-------------+---------------+---------+----------+---------+-----------+----------------------
 16390 |       16396 |          1663 |       0 |     1640 |       1 |
0/224FD88 | base/12710/16396
 16390 |       16396 |          1663 |       0 |     1641 |       1 |
0/2258680 | base/12710/16396
 16390 |       16396 |          1663 |       0 |     1642 |       1 |
0/22615A0 | base/12710/16396
...

Certainly ptrack should be used as part of some backup tool (as
pg_basebackup or pg_probackup).

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
ptrack-1.patch text/x-patch 15.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jehan-Guillaume de Rorthais 2019-04-10 14:57:11 Re: block-level incremental backup
Previous Message Alvaro Herrera 2019-04-10 13:28:21 Re: pg_dump is broken for partition tablespaces