Re: Add notes to pg_combinebackup docs

From: David Steele <david(at)pgmasters(dot)net>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Martín Marqués <martin(dot)marques(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add notes to pg_combinebackup docs
Date: 2024-04-12 09:50:59
Message-ID: e3e373de-e015-4062-b66a-c3e220ef06e8@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/12/24 19:09, Magnus Hagander wrote:
> On Fri, Apr 12, 2024 at 12:14 AM David Steele <david(at)pgmasters(dot)net
>
> OK, sure, but if the plan is to make it practical later doesn't that
> make the feature something to be avoided now?
>
>
> That could be said for any feature. When we shipped streaming
> replication, the plan was to support synchronous in the future. Should
> we not have shipped it, or told people to avoid it?

This doesn't seem like a great example. Synchronous rep is by far the
more used mode in my experience. I actively dissuade people from using
sync rep because of the downsides. More people think they need it than
actually need it.

> > However, who says this has to be the filesystem the Postgres instance
> > runs on? Who in their right mind put backups on the same volume
> as the
> > instance anyway? At which point it can be a different filesystem,
> even
> > if it's not ideal for running the database.
>
> My experience is these days backups are generally placed in object
> stores. Sure, people are still using NFS but admins rarely have much
> control over those volumes. They may or not be COW filesystems.
>
>
> If it's mounted through NFS I assume pg_combinebackup won't actually be
> able to use the COW features? Or does that actually work through NFS?

Pretty sure it won't work via NFS, but I was wrong about XFS, so...

> Mounted LUNs on a SAN I find more common today though, and there it
> would do a fine job.

Huh, interesting. This is a case I almost never see anymore.

> > All of this also depends on how people do the restore. With the CoW
> > stuff they can do a quick (and small) copy on the backup server, and
> > then copy the result to the actual instance. Or they can do
> restore on
> > the target directly (e.g. by mounting a r/o volume with backups), in
> > which case the CoW won't really help.
>
> And again, this all requires a significant amount of setup and tooling.
> Obviously I believe good backup requires effort but doing this right
> gets very complicated due to the limitations of the tool.
>
> It clearly needs to be documented that there are space needs. But
> temporarily getting space for something like that is not very
> complicated in most environments. But you do have to be aware of it.

We find many environments ridiculously tight on space. There is a
constant fight with customers/users to even get the data/WAL volumes
sized correctly.

For small databases it is probably not an issue, but this feature really
shines with very large databases.

> Generally speaking it's already the case that the "restore experience"
> with pg_basebackup is far from great. We don't have a "pg_baserestore".
> You still have to deal with archive_command and restore_command, which
> we all know can be easy to get wrong. I don't see how this is
> fundamentally worse than that.

I pretty much agree with this statement. pg_basebackup is already hard
to use effectively. Now it is just optionally harder.

> Personally, I tend to recommend that "if you want PITR and thus need to
> mess with archive_command etc, you should use a backup tool like
> pg_backrest. If you're fine with just daily backups or whatnot, use
> pg_basebackup". The incremental backup story fits somewhere in between,
> but I'd still say this is (today) primarily a tool directed at those
> that don't need full PITR.

Yeah, there are certainly cases where PITR is not required, but they
still seem to be in the minority. PITR cannot be disabled for the most
recent backup in pgBackRest and we've had few complaints about that overall.

> > But yeah, having to keep the backups as expanded directories is not
> > great, I'd love to have .tar. Not necessarily because of the disk
> space
> > (in my experience the compression in filesystems works quite well for
> > this purpose), but mostly because it's more compact and allows
> working
> > with backups as a single piece of data (e.g. it's much cleared
> what the
> > checksum of a single .tar is, compared to a directory).
>
> But again, object stores are commonly used for backup these days and
> billing is based on data stored rather than any compression that can be
> done on the data. Of course, you'd want to store the compressed tars in
> the object store, but that does mean storing an expanded copy somewhere
> to do pg_combinebackup.
>
> Object stores are definitely getting more common. I wish they were
> getting a lot more common than they actually are, because they simplify
> a lot.  But they're in my experience still very far from being a majority.

I see it the other way, especially the last few years. The majority seem
to be object stores followed up closely by NFS. Directly mounted storage
on the backup host appears to be rarer.

> But if the argument is that all this can/will be fixed in the future, I
> guess the smart thing for users to do is wait a few releases for
> incremental backups to become a practical feature.
>
> There's always going to be another set of goalposts further ahead. I
> think it can still be practical for quite a few people.

Since barman uses pg_basebackup in certain cases I imagine that will end
up being the way most users access this feature.

> I'm more worried about the issue you raised in the other thread about
> missing files, for example...

Me, too.

Regards,
-David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-04-12 10:25:57 Re: pg_upgrde failed : logical replication : alter_subscription_add_log
Previous Message Amit Langote 2024-04-12 09:43:57 Re: sql/json remaining issue