Re: block-level incremental backup

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: block-level incremental backup
Date: 2019-04-20 20:13:42
Message-ID: CA+TgmoaiuqXPJD3JwhTh2xoJm3pVEyEvOg8zR0hu9UdRUB=+iA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 20, 2019 at 12:44 PM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> Incremental backup of 1Tb DB made with distance of few minutes (small change set) is few Gbs. All of this size is made of FSM (no LSN) and VM (hard to use LSN).
> Sure, this overhead size is fine if we make daily backup. But at some frequency of backups it will be too much.

It seems like if the backups are only a few minutes apart, PITR might
be a better choice than super-frequent incremental backups. What do
you think about that?

> I think that problem of incrementing FSM and VM is too distant now.
> But if I had to implement it right now I'd choose following way: do not backup FSM and VM, recreate it during restore. Looks like it is possible, but too much AM-specific.

Interesting idea - that's worth some more thought.

> BTW, I'm all hands for extensibility and "hackability". But, personally, I'd be happy if pg_basebackup would be ubiquitous and sufficient. And tools like WAL-G and others became part of a history. There is not fundamental reason why external backup tool can be better than backup tool in core. (Unlike many PLs, data types, hooks, tuners etc)

+1

> Here's 53 mentions of "parallel backup". I want to note that there may be parallel read from disk and parallel network transmission. Things between these two are neglectable and can be single-threaded. From my POV, it's not about threads, it's about saturated IO controllers.
> Also I think parallel restore matters more than parallel backup. Backups themself can be slow, on many clusters we even throttle disk IO. But users may want parallel backup to catch-up standby.

I'm not sure I entirely understand your point here -- are you saying
that parallel backup is important, or that it's not important, or
something in between? Do you think it's more or less important than
incremental backup?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-04-20 20:17:08 Re: finding changed blocks using WAL scanning
Previous Message Robert Haas 2019-04-20 20:11:11 Re: block-level incremental backup