Re: Proposal: Incremental Backup

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Gabriele Bartolini <gabriele(dot)bartolini(at)2ndquadrant(dot)it>, Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, desmodemone <desmodemone(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Incremental Backup
Date: 2014-08-12 23:26:51
Message-ID: 20140812232651.GG16422@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Claudio,

* Claudio Freire (klaussfreire(at)gmail(dot)com) wrote:
> I'm not talking about malicious attacks, with big enough data sets,
> checksum collisions are much more likely to happen than with smaller
> ones, and incremental backups are supposed to work for the big sets.

This is an issue when you're talking about de-duplication, not when
you're talking about testing if two files are the same or not for
incremental backup purposes. The size of the overall data set in this
case is not relevant as you're only ever looking at the same (at most
1G) specific file in the PostgreSQL data directory. Were you able to
actually produce a file with a colliding checksum as an existing PG
file, the chance that you'd be able to construct one which *also* has
a valid page layout sufficient that it wouldn't be obviously massivly
corrupted is very quickly approaching zero.

> You could use strong cryptographic checksums, but such strong
> checksums still aren't perfect, and even if you accept the slim chance
> of collision, they are quite expensive to compute, so it's bound to be
> a bottleneck with good I/O subsystems. Checking the LSN is much
> cheaper.

For my 2c on this- I'm actually behind the idea of using the LSN (though
I have not followed this thread in any detail), but there's plenty of
existing incremental backup solutions (PG specific and not) which work
just fine by doing checksums. If you truely feel that this is a real
concern, I'd suggest you review the rsync binary diff protocol which is
used extensively around the world and show reports of it failing in the
field.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-08-12 23:36:59 Re: WAL format and API changes (9.5)
Previous Message Robert Haas 2014-08-12 23:08:06 Re: [PATCH] PostgreSQL 9.4 mmap(2) performance regression on FreeBSD...