Re: Differential backup

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Differential backup
Date: 2010-04-27 14:14:22
Message-ID: 0A354A31-837F-4F2C-8F95-4FE6C772D3AC@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Apr 27, 2010, at 15:50 , Alvaro Herrera wrote:
> Simon Riggs wrote:
>> Thinking about allowing a backup to tell which files have changed in the
>> database since last backup. This would allow an external utility to copy
>> away only changed files.
>>
>> Now there's a few ways of doing this and many will say this is already
>> possible using file access times.
>>
>> An explicit mechanism where Postgres could authoritatively say which
>> files have changed would make many feel safer, especially when other
>> databases also do this.
>>
>> We keep track of which files require fsync(), so we could also keep
>> track of changed files using that same information.
>
> Why file level? Seems a bit too coarse (particularly if you have large
> file support enabled). Maybe we could keep block-level last change info
> in a separate fork.

Hm, but most backup solutions work per-file and not per-block, so file-level tracking probably has more use-cases that block-level tracking..

In any case, it seems that this information could easily be extracted from the WAL. The archive_command could call a simple tool that parses the WAL and tracks the latest LSN per database file or page or whatever granularity is required. This, together with the backup label of the last backup should be enough to compute the list of changed files I think.

best regards,
Florian Pflug

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Pflug 2010-04-27 14:22:08 Re: Differential backup
Previous Message Simon Riggs 2010-04-27 14:08:11 Re: Differential backup