Re: PATCH: Exclude unlogged tables from base backups

From: David Steele <david(at)pgmasters(dot)net>
To: Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: PATCH: Exclude unlogged tables from base backups
Date: 2017-12-13 12:34:36
Message-ID: d3f60151-03b3-9443-e07c-aaaf42e9dd55@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/12/17 8:48 PM, Stephen Frost wrote:
> Andres,
>
> * Andres Freund (andres(at)anarazel(dot)de) wrote:
>> On 2017-12-12 18:04:44 -0500, David Steele wrote:
>>> If the forks are written out of order (i.e. main before init), which is
>>> definitely possible, then I think worst case is some files will be backed up
>>> that don't need to be. The main fork is unlikely to be very large at that
>>> point so it doesn't seem like a big deal.
>>>
>>> I don't see this as any different than what happens during recovery. The
>>> unlogged forks are cleaned / re-inited before replay starts which is the
>>> same thing we are doing here.
>>
>> It's quite different - in the recovery case there's no other write
>> activity going on. But on a normally running cluster the persistence of
>> existing tables can get changed, and oids can get recycled. What
>> guarantees that between the time you checked for the init fork the table
>> hasn't been dropped, the oid reused and now a permanent relation is in
>> its place?
>
> We *are* actually talking about the recovery case here because this is a
> backup that's happening and WAL replay will be happening after the
> pg_basebackup is done and then the backup restored somewhere and PG
> started up again.
>
> If the persistence is changed then the table will be written into the
> WAL, no? All of the WAL generated during a backup (which is what we're
> talking about here) has to be replayed after the restore is done and is
> before the database is considered consistent, so none of this matters,
> as far as I can see, because the drop table or alter table logged or
> anything else will be in the WAL that ends up getting replayed.

Yes - that's the way I see it. At least when I'm not tired from a day
of coding like I was last night...

> I don't think there is, because, as David points out, the unlogged
> tables are cleaned up first and then WAL replay happens during recovery,
> so the init fork will cause the relation to be overwritten, but then
> later the logged 'drop table' and subsequent re-use of the relfilenode
> to create a new table (or persistence change) will all be in the WAL and
> will be replayed over top and will take care of this.

Files can be copied in any order, so if an OID is recycled the backup
could copy its first, second, or nth incarnation. It doesn't really
matter since all of it will be clobbered by WAL replay.

The new base backup code just does the non-init fork removal in advance,
following the same rules that would apply on recovery given the same
file set.

--
-David
david(at)pgmasters(dot)net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeevan Chalke 2017-12-13 13:07:43 Re: [HACKERS] Partition-wise aggregation/grouping
Previous Message Ildus Kurbangaliev 2017-12-13 12:18:18 Re: [HACKERS] Custom compression methods