Re: block-level incremental backup

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: block-level incremental backup
Date: 2019-09-17 14:55:04
Message-ID: CA+TgmobCumfTmpoiy-cVzEcabEhPinhJ6KpOAg-MfP4d73b+TQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 16, 2019 at 3:38 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> As discussed nearby, not everything that needs to be included in the
> backup is actually going to be in the WAL though, right? How would that
> ever be able to handle the case where someone starts the server under
> wal_level = logical, takes a full backup, then restarts with wal_level =
> minimal, writes out a bunch of new data, and then restarts back to
> wal_level = logical and takes an incremental?

Fair point. I think the WAL-scanning approach can only work if
wal_level > minimal. But, I also think that few people run with
wal_level = minimal in this era where the default has been changed to
replica; and I think we can detect the WAL level in use while scanning
WAL. It can only change at a checkpoint.

> On larger systems, so many of the files are 1GB in size that checking
> the file size is quite close to meaningless. Yes, having to checksum
> all of the files definitely adds to the cost of taking the backup, but
> to avoid it we need strong assurances that a given file hasn't been
> changed since our last full backup. WAL, today at least, isn't quite
> that, and timestamps can possibly be fooled with, so if you'd like to be
> particularly careful, there doesn't seem to be a lot of alternatives.

I see your points, but it feels like you're trying to talk down the
WAL-based approach over what seem to me to be fairly manageable corner
cases.

> I'm not asking you to be an expert on those systems, just to help me
> understand the statements you're making. How is backing up to a
> pgbackrest repo different than running a pg_basebackup in the context of
> using some other Enterprise backup system? In both cases, you'll have a
> full copy of the backup (presumably compressed) somewhere out on a disk
> or filesystem which is then backed up by the Enterprise tool.

Well, I think that what people really want is to be able to backup
straight into the enterprise tool, without an intermediate step.

My basic point here is: As with practically all PostgreSQL
development, I think we should try to expose capabilities and avoid
making policy on behalf of users.

I'm not objecting to the idea of having tools that can help users
figure out how much WAL they need to retain -- but insofar as we can
do it, such tools should work regardless of where that WAL is actually
stored. I dislike the idea that PostgreSQL would provide something
akin to a "pgbackrest repository" in core, or I at least I think it
would be important that we're careful about how much functionality
gets tied to the presence and use of such a thing, because, at least
based on my experience working at EnterpriseDB, larger customers often
don't want to do it that way.

> That's not great, of course, which is why there are trade-offs to be
> made, one of which typically involves using timestamps, but doing so
> quite carefully, to perform the file exclusion. Other ideas are great
> but it seems like WAL isn't really a great idea unless we make some
> changes there and we, as in PG, haven't got a robust "we know this file
> changed as of this point" to work from. I worry that we're putting too
> much faith into a system to do something independent of what it was
> actually built and designed to do, and thinking that because we could
> trust it for X, we can trust it for Y.

That seems like a considerable overreaction to me based on the
problems reported thus far. The fact is, WAL was originally intended
for crash recovery and has subsequently been generalized to be usable
for point-in-time recovery, standby servers, and logical decoding.
It's clearly established at this point as the canonical way that you
know what in the database has changed, which is the same need that we
have for incremental backup.

At any rate, the same criticism can be leveled - IMHO with a lot more
validity - at timestamps. Last-modification timestamps are completely
outside of our control; they are owned by the OS and various operating
systems can and do have varying behavior. They can go backwards when
things have changed; they can go forwards when things have not
changed. They were clearly not intended to meet this kind of
requirement. Even, they were intended for that purpose much less so
than WAL, which was actually designed for a requirement in this
general ballpark, if not this thing precisely.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ahsan Hadi 2019-09-17 15:06:09 Re: patch: psql - enforce constant width of last column
Previous Message Amit Kapila 2019-09-17 14:38:37 Re: pgbench - allow to create partitioned tables