Re: PATCH: Exclude unlogged tables from base backups

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: Exclude unlogged tables from base backups
Date: 2018-01-22 08:14:29
Message-ID: CAD21AoDL_71pWPbTvja+MMVK4F6jMaYU=2asWY9JCRxX6qWetQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 14, 2017 at 11:58 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Dec 12, 2017 at 8:48 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>> If the persistence is changed then the table will be written into the
>> WAL, no? All of the WAL generated during a backup (which is what we're
>> talking about here) has to be replayed after the restore is done and is
>> before the database is considered consistent, so none of this matters,
>> as far as I can see, because the drop table or alter table logged or
>> anything else will be in the WAL that ends up getting replayed.
>
> I can't see a hole in this argument. If we copy the init fork and
> skip copying the main fork, then either we skipped copying the right
> file, or the file we skipped copying will be recreated with the
> correct contents during WAL replay anyway.
>
> We could have a problem if wal_level=minimal, because then the new
> file might not have been WAL-logged; but taking an online backup with
> wal_level=minimal isn't supported precisely because we won't have WAL
> replay to fix things up.
>
> We would also have a problem if the missing file caused something in
> recovery to croak on the grounds that the file was expected to be
> there, but I don't think anything works that way; I think we just
> assume missing files are an expected failure mode and silently do
> nothing if asked to remove them.
>

I also couldn't see a problem in this approach.

Here is the first review comments.

+ unloggedDelim = strrchr(path, '/');

I think it doesn't work fine on windows. How about using
last_dir_separator() instead?

----
+ * Find all unlogged relations in the specified directory and return
their OIDs.

What the ResetUnloggedrelationsHash() actually returns is a hash
table. The comment of this function seems not appropriate.

----
+ /* Part of path that contains the parent directory. */
+ int parentPathLen = unloggedDelim - path;
+
+ /*
+ * Build the unlogged relation hash if the parent path is either
+ * $PGDATA/base or a tablespace version path.
+ */
+ if (strncmp(path, "./base", parentPathLen) == 0 ||
+ (parentPathLen >=
(sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) &&
+ strncmp(unloggedDelim -
(sizeof(TABLESPACE_VERSION_DIRECTORY) - 1),
+ TABLESPACE_VERSION_DIRECTORY,
+
sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) == 0))
+ unloggedHash = ResetUnloggedRelationsHash(path);
+ }

How about using get_parent_directory() to get parent directory name?
Also, I think it's better to destroy the unloggedHash after use.

----
+ /* Exclude all forks for unlogged tables except the
init fork. */
+ if (unloggedHash && ResetUnloggedRelationsMatch(
+ unloggedHash, de->d_name) == unloggedOther)
+ {
+ elog(DEBUG2, "unlogged relation file \"%s\"
excluded from backup",
+ de->d_name);
+ continue;
+ }

I think it's better to log this debug message at DEBUG2 level for
consistency with other messages.

----
+ ok(!-f "$tempdir/tbackup/tblspc1/$tblspc1UnloggedBackupPath",
+ 'unlogged imain fork not in tablespace backup');

s/imain/main/

----
If a new unlogged relation is created after constructed the
unloggedHash before sending file, we cannot exclude such relation. It
would not be problem if the taking backup is not long because the new
unlogged relation unlikely becomes so large. However, if takeing a
backup takes a long time, we could include large main fork in the
backup.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rady, Doug 2018-01-22 08:34:06 PATCH: pgbench - option to build using ppoll() for larger connection counts
Previous Message John Naylor 2018-01-22 08:07:35 stricter MCV tests for uniform distributions (was Re: MCV lists for highly skewed distributions)