Re: File based Incremental backup v8

From: Gabriele Bartolini <gabriele(dot)bartolini(at)2ndquadrant(dot)it>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: File based Incremental backup v8
Date: 2015-03-06 14:38:56
Message-ID: CAHNtfO6urAVT22U2vaaY540Fs4RmiPnbygnLBhJ8Gk5g-u92aA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Robert,

2015-03-06 3:10 GMT+11:00 Robert Haas <robertmhaas(at)gmail(dot)com>:

> But I agree with Fujii to the extent that I see little value in
> committing this patch in the form proposed. Being smart enough to use
> the LSN to identify changed blocks, but then sending the entirety of
> every file anyway because you don't want to go to the trouble of
> figuring out how to revise the wire protocol to identify the
> individual blocks being sent and write the tools to reconstruct a full
> backup based on that data, does not seem like enough of a win.

I believe the main point is to look at a user interface point of view.
If/When we switch to a block level incremental support, this will be
completely transparent to the end user, even if we start with a file-level
approach with LSN check.

The win is already determined by the average space/time gained by users of
VLDB with a good chunk of read-only data. Our Barman users with incremental
backup (released recently - its algorithm can be compared to the one of
file-level backup proposed by Marco) can benefit on average of a data
deduplication ratio ranging between 50 to 70% of the cluster size.

A tangible example is depicted here, with Navionics saving 8.2TB a week
thanks to this approach (and 17 hours instead of 50 for backup time):
http://blog.2ndquadrant.com/incremental-backup-barman-1-4-0/

However, even smaller databases will benefit. It is clear that very small
databases as well as frequently updated ones won't be interested in
incremental backup, but that is never been the use case for this feature.

I believe that if we still think that this approach is not worth it, we are
making a big mistake. The way I see it, this patch follows an agile
approach and it is an important step towards incremental backup on a block
basis.

> As Fujii says, if we ship this patch as written, people will just keep
> using the timestamp-based approach anyway.

I think that allowing users to be able to backup in an incremental way
through streaming replication (even though based on files) will give more
flexibility to system and database administrators for their disaster
recovery solutions.

Thanks,
Gabriele

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Denys Rtveliashvili 2015-03-06 14:43:50 Stateful C-language function with state managed by third-party library
Previous Message Tom Lane 2015-03-06 14:31:16 Re: Rethinking pg_dump's function sorting code