Re: PATCH: Exclude unlogged tables from base backups

From: David Steele <david(at)pgmasters(dot)net>
To: Adam Brightwell <adam(dot)brightwell(at)crunchydata(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: Exclude unlogged tables from base backups
Date: 2018-01-24 21:23:12
Message-ID: 3a0be571-dc15-7384-849e-ad8f69412986@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/24/18 4:02 PM, Adam Brightwell wrote:
>>> If a new unlogged relation is created after constructed the
>>> unloggedHash before sending file, we cannot exclude such relation. It
>>> would not be problem if the taking backup is not long because the new
>>> unlogged relation unlikely becomes so large. However, if takeing a
>>> backup takes a long time, we could include large main fork in the
>>> backup.
>>
>> This is a good point. It's per database directory which makes it a
>> little better, but maybe not by much.
>>
>> Three options here:
>>
>> 1) Leave it as is knowing that unlogged relations created during the
>> backup may be copied and document it that way.
>>
>> 2) Construct a list for SendDir() to work against so the gap between
>> creating that and creating the unlogged hash is as small as possible.
>> The downside here is that the list may be very large and take up a lot
>> of memory.
>>
>> 3) Check each file that looks like a relation in the loop to see if it
>> has an init fork. This might affect performance since an
>> opendir/readdir loop would be required for every relation.
>>
>> Personally, I'm in favor of #1, at least for the time being. I've
>> updated the docs as indicated in case you and Adam agree.
>
> I agree with #1 and feel the updated docs are reasonable and
> sufficient to address this case for now.
>
> I have retested these patches against master at d6ab720360.
>
> All test succeed.
>
> Marking "Ready for Committer".

Thanks, Adam!

Actually, I was talking to Stephen about this it seems like #3 would be
more practical if we just stat'd the init fork for each relation file
found. I doubt the stat would add a lot of overhead and we can track
each unlogged relation in a hash table to reduce overhead even more.

I'll look at that tomorrow and see if I can work out something practical.

--
-David
david(at)pgmasters(dot)net

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-01-24 21:29:16 Re: pgsql: Add parallel-aware hash joins.
Previous Message Tom Lane 2018-01-24 21:11:09 Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump