Re: PATCH: Exclude unlogged tables from base backups

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Adam Brightwell <adam(dot)brightwell(at)crunchydata(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: Exclude unlogged tables from base backups
Date: 2018-01-29 10:28:22
Message-ID: CAD21AoAT8WgHYt0xXo7erdt=hcvbJrYtZCd96djKS_vZ8UD2Pw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 26, 2018 at 4:58 AM, David Steele <david(at)pgmasters(dot)net> wrote:
> On 1/25/18 12:31 AM, Masahiko Sawada wrote:
>> On Thu, Jan 25, 2018 at 3:25 AM, David Steele <david(at)pgmasters(dot)net> wrote:
>>>>
>>>> Here is the first review comments.
>>>>
>>>> + unloggedDelim = strrchr(path, '/');
>>>>
>>>> I think it doesn't work fine on windows. How about using
>>>> last_dir_separator() instead?
>>>
>>> I think this function is OK on Windows -- we use it quite a bit.
>>> However, last_dir_separator() is clearer so I have changed it.
>>
>> Thank you for updating this. I was concerned about a separator
>> character '/' might not work fine on windows.
>
> Ah yes, I see what you mean now.
>
>> On Thu, Jan 25, 2018 at 6:23 AM, David Steele <david(at)pgmasters(dot)net> wrote:
>>> On 1/24/18 4:02 PM, Adam Brightwell wrote:
>>> Actually, I was talking to Stephen about this it seems like #3 would be
>>> more practical if we just stat'd the init fork for each relation file
>>> found. I doubt the stat would add a lot of overhead and we can track
>>> each unlogged relation in a hash table to reduce overhead even more.
>>>
>>
>> Can the readdir handle files that are added during the loop? I think
>> that we still cannot exclude a new unlogged relation if the relation
>> is added after we execute readdir first time. To completely eliminate
>> it we need a sort of lock that prevents to create new unlogged
>> relation from current backends. Or we need to do readdir loop multiple
>> times to see if no new relations were added during sending files.
>
> As far as I know readdir() is platform-dependent in terms of how it
> scans the dir and if files created after the opendir() will appear.
>
> It shouldn't matter, though, since WAL replay will recreate those files.

Yea, agreed.

>
>> If you're updating the patch to implement #3, this patch should be
>> marked as "Waiting on Author". After updated I'll review it again.
> Attached is a new patch that uses stat() to determine if the init fork
> for a relation file exists. I decided not to build a hash table as it
> could use considerable memory and I didn't think it would be much faster
> than a simple stat() call.
>
> The reinit.c refactor has been removed since it was no longer needed.
> I'll submit the tests I wrote for reinit.c as a separate patch for the
> next CF.
>

Thank you for updating the patch! The patch looks good to me. But I
have a question; can we exclude temp tables as well? The pg_basebackup
includes even temp tables. But I don't think that it's necessary for
backups.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2018-01-29 10:38:22 Re: Logical Decoding and HeapTupleSatisfiesVacuum assumptions
Previous Message Kyotaro HORIGUCHI 2018-01-29 10:26:34 Re: [HACKERS] Restricting maximum keep segments by repslots