[RFC] Incremental backup v3: incremental PoC

From: Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: [RFC] Incremental backup v3: incremental PoC
Date: 2014-10-14 17:17:27
Message-ID: 543D5AA7.9@2ndquadrant.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Hackers,

following the advices gathered on the list I've prepared a third partial
patch on the way of implementing incremental pg_basebackup as described
here https://wiki.postgresql.org/wiki/Incremental_backup

== Changes

Compared to the previous version I've made the following changes:

* The backup_profile is not optional anymore. Generating it is cheap
enough not to bother the user with such a choice.

* I've isolated the code which detects the maxLSN of a segment in a
separate getMaxLSN function. At the moment it works scanning the whole
file, but I'm looking to replace it in the next versions.

* I've made possible to request an incremental backup passing a "-I
<LSN>" option to pg_basebackup. It is probably too "raw" to remain as
is, but it's is useful at this stage to test the code.

* I've modified the backup label to report the fact that the backup was
taken with the incremental option. The result will be something like:

START WAL LOCATION: 0/52000028 (file 000000010000000000000052)
START TIME: 2014-10-14 16:05:04 CEST
LABEL: pg_basebackup base backup

== Testing it

At this stage you can make an incremental file-level backup using this

pg_basebackup -v -F p -D /tmp/x -x
LSN=$(awk '/^START WAL/{print $4}' /tmp/x/backup_profile)
pg_basebackup -v -F p -D /tmp/y -I $LSN -x

the result will be an incremental backup in /tmp/y based on the full
backup on /tmp/x.

You can "reintegrate" the incremental backup in the /tmp/z directory
with the following little python script, calling it as

./recover.py /tmp/x /tmp/y /tmp/z

#!/usr/bin/env python
# recover.py

import os
import shutil
import sys

if len(sys.argv) != 4:
print >> sys.stderr, "usage: %s base incremental destination"


if os.path.exists(dest):
print >> sys.stderr, "error: destination must not exist (%s)" % dest

profile=open(os.path.join(incr, 'backup_profile'), 'r')

for line in profile:
if line.strip() == 'FILE LIST':

shutil.copytree(incr, dest)
for line in profile:
tblspc, lsn, sent, date, size, path = line.strip().split('\t')
if sent == 't' or lsn=='\\N':
base_file = os.path.join(base, path)
dest_file = os.path.join(dest, path)
shutil.copy2(base_file, dest_file)

It has obviously to be replaced by a full-fledged user tool, but it is
enough to test the concept.

== What next

I would to replace the getMaxLSN function with a more-or-less persistent
structure which contains the maxLSN for each data segment.

To make it work I would hook into the ForwardFsyncRequest() function in
src/backend/postmaster/checkpointer.c and update an in memory hash every
time a block is going to be fsynced. The structure could be persisted on
disk at some time (probably on checkpoint).

I think a good key for the hash would be a BufferTag with blocknum
"rounded" to the start of the segment.

I'm here asking for comments and advices on how to implement it in an
acceptable way.

== Disclaimer

The code here is an intermediate step, it does not contain any
documentation beside the code comments and will be subject to deep and
radical changes. However I believe it can be a base to allow PostgreSQL
to have its file-based incremental backup, and a block-based incremental
backup after it.


Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco(dot)nenciarini(at)2ndQuadrant(dot)it | www.2ndQuadrant.it

Attachment Content-Type Size
file-based-incremental-backup.patch text/plain 31.3 KB


Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2014-10-14 17:21:45 Re: Buffer Requests Trace
Previous Message Lucas Lersch 2014-10-14 17:10:57 Re: Buffer Requests Trace