Re: [RFC] Incremental backup v2: add backup profile to base backup

From: Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] Incremental backup v2: add backup profile to base backup
Date: 2014-10-06 15:33:40
Message-ID: 5432B654.6000003@2ndquadrant.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Il 03/10/14 22:47, Robert Haas ha scritto:
> On Fri, Oct 3, 2014 at 12:08 PM, Marco Nenciarini
> <marco(dot)nenciarini(at)2ndquadrant(dot)it> wrote:
>> Il 03/10/14 17:53, Heikki Linnakangas ha scritto:
>>> If we're going to need a profile file - and I'm not convinced of that -
>>> is there any reason to not always include it in the backup?
>>
>> The main reason is to have a centralized list of files that need to be
>> present. Without a profile, you have to insert some sort of placeholder
>> for kipped files.
>
> Why do you need to do that? And where do you need to do that?
>
> It seems to me that there are three interesting operations:
>
> 1. Take a full backup. Basically, we already have this. In the
> backup label file, make sure to note the newest LSN guaranteed to be
> present in the backup.

Don't we already have it in "START WAL LOCATION"?

>
> 2. Take a differential backup. In the backup label file, note the LSN
> of the fullback to which the differential backup is relative, and the
> newest LSN guaranteed to be present in the differential backup. The
> actual backup can consist of a series of 20-byte buffer tags, those
> being the exact set of blocks newer than the base-backup's
> latest-guaranteed-to-be-present LSN. Each buffer tag is followed by
> an 8kB block of data. If a relfilenode is truncated or removed, you
> need some way to indicate that in the backup; e.g. include a buffertag
> with forknum = -(forknum + 1) and blocknum = the new number of blocks,
> or InvalidBlockNumber if removed entirely.

To have a working backup you need to ship each block which is newer than
latest-guaranteed-to-be-present in full backup and not newer than
latest-guaranteed-to-be-present in the current backup. Also, as a
further optimization, you can think about not sending the empty space in
the middle of each page.

My main concern here is about how postgres can remember that a
relfilenode has been deleted, in order to send the appropriate "deletion
tag".

IMHO the easiest way is to send the full list of files along the backup
and let to the client the task to delete unneeded files. The backup
profile has this purpose.

Moreover, I do not like the idea of using only a stream of block as the
actual differential backup, for the following reasons:

* AFAIK, with the current infrastructure, you cannot do a backup with a
block stream only. To have a valid backup you need many files for which
the concept of LSN doesn't apply.

* I don't like to have all the data from the various
tablespace/db/whatever all mixed in the same stream. I'd prefer to have
the blocks saved on a per file basis.

>
> 3. Apply a differential backup to a full backup to create an updated
> full backup. This is just a matter of scanning the full backup and
> the differential backup and applying the changes in the differential
> backup to the full backup.
>
> You might want combinations of these, like something that does 2+3 as
> a single operation, for efficiency, or a way to copy a full backup and
> apply a differential backup to it as you go. But that's it, right?
> What else do you need?
>

Nothing else. Once we agree on definition of involved files and
protocols formats, only the actual coding remains.

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco(dot)nenciarini(at)2ndQuadrant(dot)it | www.2ndQuadrant.it

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-10-06 15:35:25 Re: UPSERT wiki page, and SQL MERGE syntax
Previous Message Ali Akbar 2014-10-06 15:12:55 Re: Add generate_series(numeric, numeric)