Re: PATCH: Exclude unlogged tables from base backups

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: Exclude unlogged tables from base backups
Date: 2017-12-12 23:08:50
Message-ID: CAB7nPqRW=rk_i8q1UhfDcNGvYpMVm4oTqwRacWHG=+5uRLi9UQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 13, 2017 at 8:04 AM, David Steele <david(at)pgmasters(dot)net> wrote:
> On 12/12/17 5:52 PM, Andres Freund wrote:
>> On 2017-12-12 17:49:54 -0500, David Steele wrote:
>>>
>>> Including unlogged relations in base backups takes up space and is
>>> wasteful
>>> since they are truncated during backup recovery.
>>>
>>> The attached patches exclude unlogged relations from base backups except
>>> for
>>> the init fork, which is required to recreate the main fork during
>>> recovery.
>>
>>
>> How do you reliably identify unlogged relations while writes are going
>> on? Without locks that sounds, uh, nontrivial?
>
>
> I don't think this is an issue. If the init fork exists it should be OK if
> it is torn since it will be recreated from WAL.

Yeah, I was just typing that until I saw your message.

> If the forks are written out of order (i.e. main before init), which is
> definitely possible, then I think worst case is some files will be backed up
> that don't need to be. The main fork is unlikely to be very large at that
> point so it doesn't seem like a big deal.

As far as I recall the init forks are logged before the main forks. I
don't think that we should rely on that assumption though to be always
satisfied.

>>> I decided not to try and document unlogged exclusions in the continuous
>>> backup documentation yet (they are noted in the protocol docs). I would
>>> like to get some input on whether the community thinks this is a good
>>> idea.
>>> It's a non-trivial procedure that would be easy to misunderstand and does
>>> not affect the quality of the backup other than using less space.
>>> Thoughts?
>>
>>
>> Think it's a good idea, I've serious concerns about practicability of a
>> correct implementation though.
>
> Well, I would be happy if you had a look!

You can count me in. I think that this patch has value for some
dedicated workloads. It is a waste to backup stuff that will be
removed at recovery anyway.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2017-12-12 23:18:09 Re: PATCH: Exclude unlogged tables from base backups
Previous Message Stefan Keller 2017-12-12 23:07:55 Re: Learned Indexes in PostgreSQL?