Re: block-level incremental backup

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: block-level incremental backup
Date: 2019-04-20 16:44:35
Message-ID: C3B78817-C247-44DB-AC56-ACDEF5F800BD@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

Sorry for the delay.

> 18 апр. 2019 г., в 21:56, Robert Haas <robertmhaas(at)gmail(dot)com> написал(а):
>
> On Wed, Apr 17, 2019 at 5:20 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>> As I understand it, the problem is not with backing up an individual
>> database or cluster, but rather dealing with backing up thousands of
>> individual clusters with thousands of tables in each, leading to an
>> awful lot of tables with lots of FSMs/VMs, all of which end up having to
>> get copied and stored wholesale. I'll point this thread out to him and
>> hopefully he'll have a chance to share more specific information.
>
> Sounds good.

During introduction of WAL-delta backups, we faced two things:
1. Heavy spike in network load. We shift beginning of backup randomly, but variation is not very big: night is short and we want to make big backups during low rps time. This low variation of time of starts of small backups creates big network spike.
2. Incremental backups became very cheap if measured in used resources of a single cluster.

1st is not a big problem, actually, bit we realized that we can do incremental backups not just at night, but, for example, 4 times a day. Or every hour. Or every minute. Why not, if they are cheap enough?

Incremental backup of 1Tb DB made with distance of few minutes (small change set) is few Gbs. All of this size is made of FSM (no LSN) and VM (hard to use LSN).
Sure, this overhead size is fine if we make daily backup. But at some frequency of backups it will be too much.

I think that problem of incrementing FSM and VM is too distant now.
But if I had to implement it right now I'd choose following way: do not backup FSM and VM, recreate it during restore. Looks like it is possible, but too much AM-specific.
It is hard when you write backup tool in Go and cannot simply link with PG.

> 15 апр. 2019 г., в 18:01, Stephen Frost <sfrost(at)snowman(dot)net> написал(а):
> ...the goal here
> isn't actually to make pg_basebackup into an enterprise backup tool,
> ...

BTW, I'm all hands for extensibility and "hackability". But, personally, I'd be happy if pg_basebackup would be ubiquitous and sufficient. And tools like WAL-G and others became part of a history. There is not fundamental reason why external backup tool can be better than backup tool in core. (Unlike many PLs, data types, hooks, tuners etc)

Here's 53 mentions of "parallel backup". I want to note that there may be parallel read from disk and parallel network transmission. Things between these two are neglectable and can be single-threaded. From my POV, it's not about threads, it's about saturated IO controllers.
Also I think parallel restore matters more than parallel backup. Backups themself can be slow, on many clusters we even throttle disk IO. But users may want parallel backup to catch-up standby.

Thanks.

Best regards, Andrey Borodin.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2019-04-20 17:54:45 Re: jsonpath
Previous Message Tom Lane 2019-04-20 15:50:01 Re: TM format can mix encodings in to_char()