Re: Proposal: Incremental Backup

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Gabriele Bartolini <gabriele(dot)bartolini(at)2ndquadrant(dot)it>
Cc: Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, desmodemone <desmodemone(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Incremental Backup
Date: 2014-08-12 14:40:07
Message-ID: CAGTBQpY6rcrcurDtCcOGc7Ac8zjrizgz4tyNH4vyYLjXxNQ_0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 12, 2014 at 11:17 AM, Gabriele Bartolini
<gabriele(dot)bartolini(at)2ndquadrant(dot)it> wrote:
>
> 2014-08-12 15:25 GMT+02:00 Claudio Freire <klaussfreire(at)gmail(dot)com>:
>> Still not safe. Checksum collisions do happen, especially in big data sets.
>
> Can I ask you what you are currently using for backing up large data
> sets with Postgres?

Currently, a time-delayed WAL archive hot standby, pg_dump sparingly,
filesystem snapshots (incremental) of the standby more often, with the
standby down.

When I didn't have the standby, I did online filesystem snapshots of
the master with WAL archiving to prevent inconsistency due to
snapshots not being atomic.

On Tue, Aug 12, 2014 at 11:25 AM, Marco Nenciarini
<marco(dot)nenciarini(at)2ndquadrant(dot)it> wrote:
> Il 12/08/14 15:25, Claudio Freire ha scritto:
>> On Tue, Aug 12, 2014 at 6:41 AM, Marco Nenciarini
>> <marco(dot)nenciarini(at)2ndquadrant(dot)it> wrote:
>>> To declared two files identical they must have same size,
>>> same mtime and same *checksum*.
>>
>> Still not safe. Checksum collisions do happen, especially in big data sets.
>>
>
> IMHO it is still good-enough. We are not trying to protect from a
> malicious attack, we are using it to protect against some *casual* event.

I'm not talking about malicious attacks, with big enough data sets,
checksum collisions are much more likely to happen than with smaller
ones, and incremental backups are supposed to work for the big sets.

You could use strong cryptographic checksums, but such strong
checksums still aren't perfect, and even if you accept the slim chance
of collision, they are quite expensive to compute, so it's bound to be
a bottleneck with good I/O subsystems. Checking the LSN is much
cheaper.

Still, do as you will. As everybody keeps saying it's better than
nothing, lets let usage have the final word.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message MauMau 2014-08-12 14:47:38 Re: Improvement of versioning on Windows, take two
Previous Message Kevin Grittner 2014-08-12 14:39:25 Re: delta relations in AFTER triggers