From: | Jim Nasby <jim(at)nasby(dot)net> |
---|---|
To: | Claudio Freire <klaussfreire(at)gmail(dot)com> |
Cc: | Tatsuo Ishii <ishii(at)postgresql(dot)org>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Implementing incremental backup |
Date: | 2013-06-19 18:54:15 |
Message-ID: | 51C1FE57.2010207@nasby.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 6/19/13 11:02 AM, Claudio Freire wrote:
> On Wed, Jun 19, 2013 at 7:13 AM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>>
>> For now, my idea is pretty vague.
>>
>> - Record info about modified blocks. We don't need to remember the
>> whole history of a block if the block was modified multiple times.
>> We just remember that the block was modified since the last
>> incremental backup was taken.
>>
>> - The info could be obtained by trapping calls to mdwrite() etc. We need
>> to be careful to avoid such blocks used in xlogs and temporary
>> tables to not waste resource.
>>
>> - If many blocks were modified in a file, we may be able to condense
>> the info as "the whole file was modified" to reduce the amount of
>> info.
>>
>> - How to take a consistent incremental backup is an issue. I can't
>> think of a clean way other than "locking whole cluster", which is
>> obviously unacceptable. Maybe we should give up "hot backup"?
>
>
> I don't see how this is better than snapshotting at the filesystem
> level. I have no experience with TB scale databases (I've been limited
> to only hundreds of GB), but from my limited mid-size db experience,
> filesystem snapshotting is pretty much the same thing you propose
> there (xfs_freeze), and it works pretty well. There's even automated
> tools to do that, like bacula, and they can handle incremental
> snapshots.
A snapshot is not the same as an incremental backup; it presents itself as a full copy of the filesystem. Actually, since it's on the same underlying storage a snapshot isn't really a good backup at all.
The proposal (at least as I read it) is to provide a means to easily deal with *only* the data that has actually *changed* since the last backup was taken.
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2013-06-19 18:54:27 | Re: How do we track backpatches? |
Previous Message | Jan Wieck | 2013-06-19 18:53:31 | Re: [PATCH] add --throttle to pgbench (submission 3) |