Re: [RFC] Incremental backup v2: add backup profile to base backup

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] Incremental backup v2: add backup profile to base backup
Date: 2014-10-06 15:50:07
Message-ID: CA+TgmoYdG1JvymERkGozpfazJBHTNbxSAvWMHGmK7dRioP8bAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 6, 2014 at 11:33 AM, Marco Nenciarini
<marco(dot)nenciarini(at)2ndquadrant(dot)it> wrote:
>> 1. Take a full backup. Basically, we already have this. In the
>> backup label file, make sure to note the newest LSN guaranteed to be
>> present in the backup.
>
> Don't we already have it in "START WAL LOCATION"?

Yeah, probably. I was too lazy to go look for it, but that sounds
like the right thing.

>> 2. Take a differential backup. In the backup label file, note the LSN
>> of the fullback to which the differential backup is relative, and the
>> newest LSN guaranteed to be present in the differential backup. The
>> actual backup can consist of a series of 20-byte buffer tags, those
>> being the exact set of blocks newer than the base-backup's
>> latest-guaranteed-to-be-present LSN. Each buffer tag is followed by
>> an 8kB block of data. If a relfilenode is truncated or removed, you
>> need some way to indicate that in the backup; e.g. include a buffertag
>> with forknum = -(forknum + 1) and blocknum = the new number of blocks,
>> or InvalidBlockNumber if removed entirely.
>
> To have a working backup you need to ship each block which is newer than
> latest-guaranteed-to-be-present in full backup and not newer than
> latest-guaranteed-to-be-present in the current backup. Also, as a
> further optimization, you can think about not sending the empty space in
> the middle of each page.

Right. Or compressing the data.

> My main concern here is about how postgres can remember that a
> relfilenode has been deleted, in order to send the appropriate "deletion
> tag".

You also need to handle truncation.

> IMHO the easiest way is to send the full list of files along the backup
> and let to the client the task to delete unneeded files. The backup
> profile has this purpose.
>
> Moreover, I do not like the idea of using only a stream of block as the
> actual differential backup, for the following reasons:
>
> * AFAIK, with the current infrastructure, you cannot do a backup with a
> block stream only. To have a valid backup you need many files for which
> the concept of LSN doesn't apply.
>
> * I don't like to have all the data from the various
> tablespace/db/whatever all mixed in the same stream. I'd prefer to have
> the blocks saved on a per file basis.

OK, that makes sense. But you still only need the file list when
sending a differential backup, not when sending a full backup. So
maybe a differential backup looks like this:

- Ship a table-of-contents file with a list relation files currently
present and the length of each in blocks.
- For each block that's been modified since the original backup, ship
a file called delta_<original file name> which is of the form <block
number><changed block contents> [...].

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marti Raudsepp 2014-10-06 15:51:06 Re: Add generate_series(numeric, numeric)
Previous Message Andres Freund 2014-10-06 15:42:08 Re: Inefficient barriers on solaris with sun cc