Re: block-level incremental backup

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: block-level incremental backup
Date: 2019-04-22 07:38:18
Message-ID: a40d9787-d910-d5f4-9a2b-5c533ccd3b6d@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 22.04.2019 2:02, Robert Haas wrote:
> On Sat, Apr 20, 2019 at 4:32 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>> Having been around for a while working on backup-related things, if I
>> was to implement the protocol for pg_basebackup today, I'd definitely
>> implement "give me a list" and "give me this file" rather than the
>> tar-based approach, because I've learned that people want to be
>> able to do parallel backups and that's a decent way to do that. I
>> wouldn't set out and implement something new that's there's just no hope
>> of making parallel. Maybe the first write of pg_basebackup would still
>> be simple and serial since it's certainly more work to make a frontend
>> tool like that work in parallel, but at least the protocol would be
>> ready to support a parallel option being added alter without being
>> rewritten.
>>
>> And that's really what I was trying to get at here- if we've got the
>> choice now to decide what this is going to look like from a protocol
>> level, it'd be great if we could make it able to support being used in a
>> parallel fashion, even if pg_basebackup is still single-threaded.
> I think we're getting closer to a meeting of the minds here, but I
> don't think it's intrinsically necessary to rewrite the whole method
> of operation of pg_basebackup to implement incremental backup in a
> sensible way. One could instead just do a straightforward extension
> to the existing BASE_BACKUP command to enable incremental backup.
> Then, to enable parallel full backup and all sorts of out-of-core
> hacking, one could expand the command language to allow tools to
> access individual steps: START_BACKUP, SEND_FILE_LIST,
> SEND_FILE_CONTENTS, STOP_BACKUP, or whatever. The second thing makes
> for an appealing project, but I do not think there is a technical
> reason why it has to be done first. Or for that matter why it has to
> be done second. As I keep saying, incremental backup and full backup
> are separate projects and I believe it's completely reasonable for
> whoever is doing the work to decide on the order in which they would
> like to do the work.
>
> Having said that, I'm curious what people other than Stephen (and
> other pgbackrest hackers) think about the relative value of parallel
> backup vs. incremental backup. Stephen appears quite convinced that
> parallel backup is full of win and incremental backup is a bit of a
> yawn by comparison, and while I certainly would not want to discount
> the value of his experience in this area, it sometimes happens on this
> mailing list that [ drum roll please ] not everybody agrees about
> everything. So, what do other people think?
>

Based on the experience of pg_probackup users I can say that  there is
no 100% winer and depending on use case either
parallel either incremental backups are preferable.
- If size of database is not so larger and intensity of updates is high
enough, then parallel backup within one data center is definitely more
efficient solution.
- If size of database is very large and data is rarely updated or
database is mostly append-only, then incremental backup is preferable.
- Some customers need to collect at central server backups of databases
installed at many nodes with slow and unreliable connection (assume DBMS
installed at locomotives). Definitely parallelism can not help here,
unlike support of incremental backup.
- Parallel backup more aggressively consumes resources of the system,
interfering with normal work of application. So performing parallel
backup may cause significant degradation of application speed.

pg_probackup supports both features: parallel and incremental backups
and it is up to user how to use it in more efficient way for particular
configuration.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2019-04-22 07:40:27 Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Previous Message Kyotaro HORIGUCHI 2019-04-22 07:15:13 Re: standby recovery fails (tablespace related) (tentative patch and discussion)