Re: Proposal: Incremental Backup

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: desmodemone <desmodemone(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Incremental Backup
Date: 2014-08-01 17:05:21
Message-ID: CAGTBQpbS2Zo0_-t5oeDUXH8MS5X5LAnGdQF7JK2y=AXn2mi2rA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 1, 2014 at 1:43 PM, desmodemone <desmodemone(at)gmail(dot)com> wrote:
>
>
>
> 2014-08-01 18:20 GMT+02:00 Claudio Freire <klaussfreire(at)gmail(dot)com>:
>
>> On Fri, Aug 1, 2014 at 12:35 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> wrote:
>> >> c) the map is not crash safe by design, because it needs only for
>> >> incremental backup to track what blocks needs to be backuped, not for
>> >> consistency or recovery of the whole cluster, so it's not an heavy cost
>> >> for
>> >> the whole cluster to maintain it. we could think an option (but it's
>> >> heavy)
>> >> to write it at every flush on file to have crash-safe map, but I not
>> >> think
>> >> it's so usefull . I think it's acceptable, and probably it's better to
>> >> force
>> >> that, to say: "if your db will crash, you need a fullbackup ",
>> >
>> > I am not sure if your this assumption is right/acceptable, how can
>> > we say that in such a case users will be okay to have a fullbackup?
>> > In general, taking fullbackup is very heavy operation and we should
>> > try to avoid such a situation.
>>
>>
>> Besides, the one taking the backup (ie: script) may not be aware of
>> the need to take a full one.
>>
>> It's a bad design to allow broken backups at all, IMNSHO.
>
>
> Hi Claudio,
> thanks for your observation
> First: the case it's after a crash of a database, and it's not something
> happens every day or every week. It's something that happens in rare
> conditions, or almost my experience is so. If it happens very often probably
> there are other problems.

Not so much. In this case, the software design isn't software-crash
safe, it's not that it's not hardware-crash safe.

What I mean, is that an in-memory bitmap will also be out of sync if
you kill -9 (or if one of the backends is killed by the OOM), or if it
runs out of disk space too.

Normally, a simple restart fixes it because pg will do crash recovery
just fine, but now the bitmap is out of sync, and further backups are
broken. It's not a situation I want to face unless there's a huge
reason to go for such design.

If you make it so that the commit includes flipping the bitmap, it can
be done cleverly enough to avoid too much overhead (though it will
have some), and you now have it so that any to-be-touched block is now
part of the backup. You just apply all the bitmap changes in batch
after a checkpoint, before syncing to disk, and before erasing the WAL
segments. Simple, relatively efficient, and far more robust than an
in-memory thing.

Still, it *can* double checkpoint I/O on the worst case, and it's not
an unfathomable case either.

> Second: to avoid the problem to know if the db needed to have a full backup
> to rebuild the map we could think to write in the map header the backup
> reference (with an id and LSN reference for example ) so if the
> someone/something try to do an incremental backup after a crash, the map
> header will not have noone full backup listed [because it will be empty] ,
> and automaticcaly switch to a full one. I think after a crash it's a good
> practice to do a full backup, to see if there are some problems on files or
> on filesystems, but if I am wrong I am happy to know :) .

After a crash I do not do a backup, I do a verification of the data
(VACUUM and some data consistency checks usually), lest you have a
useless backup. The backup goes after that.

But, I'm not DBA guru.

> Remember that I propose a map in ram to reduce the impact on performances,
> but we could create an option to leave the choose to the user, if you want a
> crash safe map, at every flush will be updated also a map file , if not, the
> map will be in ram.

I think the performance impact of a WAL-linked map isn't so big as to
prefer the possibility of broken backups. I wouldn't even allow it.

It's not free, making it crash safe, but it's not that expensive
either. If you want to support incremental backups, you really really
need to make sure those backups are correct and usable, and IMV
anything short of full crash safety will be too fragile for that
purpose. I don't want to be in a position of needing the backup and
finding out it's inconsistent after the fact, and I don't want to
encourage people to set themselves up for that by adding that "faster
but unsafe backups" flag.

I'd do it either safe, or not at all.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-08-01 17:40:25 Re: Supporting Windows SChannel as OpenSSL replacement
Previous Message Fujii Masao 2014-08-01 16:47:16 Re: Bug of pg_receivexlog -v