Re: Add notes to pg_combinebackup docs

From: David Steele <david(at)pgmasters(dot)net>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Martín Marqués <martin(dot)marques(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add notes to pg_combinebackup docs
Date: 2024-04-12 22:44:29
Message-ID: f0fd7c33-645f-467e-a5f9-0a875c4c035c@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/12/24 22:40, Tomas Vondra wrote:
> On 4/12/24 11:50, David Steele wrote:
>> On 4/12/24 19:09, Magnus Hagander wrote:
>>> On Fri, Apr 12, 2024 at 12:14 AM David Steele <david(at)pgmasters(dot)net
>>>
>>> ...>>
>>>      > But yeah, having to keep the backups as expanded directories is
>>> not
>>>      > great, I'd love to have .tar. Not necessarily because of the disk
>>>     space
>>>      > (in my experience the compression in filesystems works quite
>>> well for
>>>      > this purpose), but mostly because it's more compact and allows
>>>     working
>>>      > with backups as a single piece of data (e.g. it's much cleared
>>>     what the
>>>      > checksum of a single .tar is, compared to a directory).
>>>
>>>     But again, object stores are commonly used for backup these days and
>>>     billing is based on data stored rather than any compression that
>>> can be
>>>     done on the data. Of course, you'd want to store the compressed
>>> tars in
>>>     the object store, but that does mean storing an expanded copy
>>> somewhere
>>>     to do pg_combinebackup.
>>>
>>> Object stores are definitely getting more common. I wish they were
>>> getting a lot more common than they actually are, because they
>>> simplify a lot.  But they're in my experience still very far from
>>> being a majority.
>>
>> I see it the other way, especially the last few years. The majority seem
>> to be object stores followed up closely by NFS. Directly mounted storage
>> on the backup host appears to be rarer.
>>
>
> One thing I'd mention is that not having built-in support for .tar and
> .tgz backups does not mean it's impossible to use pg_combinebackup with
> archives. You can mount them using e.g. "ratarmount" and then use that
> as source directories for pg_combinebackup.
>
> It's not entirely friction-less because AFAICS it's necessary to do the
> backup in plain format and then do the .tar to have the expected "flat"
> directory structure (and not manifest + 2x tar). But other than that it
> seems to work fine (based on my limited testing).

Well, that's certainly convoluted and doesn't really help a lot in terms
of space consumption, it just shifts the additional space required to
the backup side. I doubt this is something we'd be willing to add to our
documentation so it would be up to the user to figure out and script.

> FWIW the "archivemount" performs terribly, so adding this capability
> into pg_combinebackup is clearly far from trivial.

I imagine this would perform pretty badly. And yes, doing it efficiently
is not trivial but certainly doable. Scanning the tar file and matching
to entries in the manifest is one way, but I would prefer to store the
offsets into the tar file in the manifest then assemble an ordered list
of work to do on each tar file. But of course the latter requires a
manifest-centric approach, which is not what we have right now.

Regards,
-David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2024-04-12 23:03:28 Re: post-freeze damage control
Previous Message Tom Lane 2024-04-12 22:20:58 Re: CASE control block broken by a single line comment