Re: [RFC] Incremental backup v2: add backup profile to base backup

From: Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] Incremental backup v2: add backup profile to base backup
Date: 2014-10-06 11:30:10
Message-ID: 54327D42.9020504@2ndquadrant.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Il 03/10/14 23:12, Andres Freund ha scritto:
> On 2014-10-03 17:31:45 +0200, Marco Nenciarini wrote:
>> I've updated the wiki page
>> https://wiki.postgresql.org/wiki/Incremental_backup following the result
>> of discussion on hackers.
>>
>> Compared to first version, we switched from a timestamp+checksum based
>> approach to one based on LSN.
>>
>> This patch adds an option to pg_basebackup and to replication protocol
>> BASE_BACKUP command to generate a backup_profile file. It is almost
>> useless by itself, but it is the foundation on which we will build the
>> file based incremental backup (and hopefully a block based incremental
>> backup after it).
>>
>> Any comment will be appreciated. In particular I'd appreciate comments
>> on correctness of relnode files detection and LSN extraction code.
>
> Can you describe the algorithm you implemented in words?
>

Here it is the relnode files detection algorithm:

I've added a has_relfiles parameter to the sendDir function. If
has_relfiles is true every file in the directory is tested against the
validateRelfilenodeName function. If the response is true, the maxLSN
value is computed for the file.

The sendDir function is called with has_relfiles=true by sendTablespace
function and by sendDir itself when is recurring into a subdirectory

* if has_relfiles is true
* if we are recurring into a "./global" or "./base" directory

The validateRelfilenodeName has been taken from pg_computemaxlsn patch.

It's short enough to be pasted here:

static bool
validateRelfilenodename(char *name)
{
int pos = 0;

while ((name[pos] >= '0') && (name[pos] <= '9'))
pos++;

if (name[pos] == '_')
{
pos++;
while ((name[pos] >= 'a') && (name[pos] <= 'z'))
pos++;
}
if (name[pos] == '.')
{
pos++;
while ((name[pos] >= '0') && (name[pos] <= '9'))
pos++;
}

if (name[pos] == 0)
return true;
return false;
}

To compute the maxLSN for a file, as the file is sent in TAR_SEND_SIZE
chunks (32kb) and it is always a multiple of the block size, I've added
the following code inside the send cycle:

+ char *page;
+
+ /* Scan every page to find the max file LSN */
+ for (page = buf; page < buf + (off_t) cnt; page += (off_t) BLCKSZ) {
+ pagelsn = PageGetLSN(page);
+ if (filemaxlsn < pagelsn)
+ filemaxlsn = pagelsn;
+ }
+

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco(dot)nenciarini(at)2ndQuadrant(dot)it | www.2ndQuadrant.it

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2014-10-06 11:49:11 Re: pg_receivexlog and replication slots
Previous Message Andres Freund 2014-10-06 11:29:26 Re: WAL format and API changes (9.5)