Re: pg_basebackup stream xlog to tar

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup stream xlog to tar
Date: 2016-09-05 08:01:42
Message-ID: CAB7nPqRmGuyVB6pav5Ee=H+DM-XaNM98e-xeG4xV4wb6Urk+CA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 3, 2016 at 10:35 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> Ugh. That would be nice to have, but I think that's outside the scope of
> this patch.

A test for this patch that could have value would be to use
pg_basebackup -X stream -Ft, then untar pg_xlog.tar and look at the
size of the segments. If you have an idea to untar something without
the in-core perl support because we need to have the TAP stuff able to
run on at least 5.8.8, I'd welcome an idea. Honestly I still have
none, and that's why the recovery tests do not use tarballs in their
tests when using pg_basebackup. In short let's not add something more
for this patch.

> PFA is an updated version of this patch that:
> * documents a magic value passed to zlib (which is in their documentation as
> being a magic value, but has no define)
> * fixes the padding of tar files
> * adds a most basic test that the -X stream -Ft does produce a tarfile

So I had a more serious look at this patch, and it basically makes
more generic the operations done for the plain mode by adding a set of
routines that can be used by both tar and plain mode to work on the
WAL files streamed. Elegant.

+ <para>
+ The transaction log files will be written to
+ the <filename>base.tar</filename> file.
+ </para>
Nit: number of spaces here.

-mark_file_as_archived(const char *basedir, const char *fname)
+mark_file_as_archived(StreamCtl *stream, const char *fname)
Just passing WalMethod as argument would be enough, but... My patch
adding the fsync calls to pg_basebackup could just make use of
StreamCtl, so let's keep it as you suggest.

static bool
existsTimeLineHistoryFile(StreamCtl *stream)
{
[...]
+ return stream->walmethod->existsfile(histfname);
}
existsfile returns always false for the tar method. This does not
matter much because pg_basebackup exists immediately in case of a
failure, but I think that this deserves a comment in ReceiveXlogStream
where existsTimeLineHistoryFile is called.

I find the use of existsfile() in open_walfile() rather confusing
because this relies on the fact that existsfile() returns always
false for the tar mode. We could add an additional field in WalMethod
to store the method type and use that more, but that may make the code
more confusing than what you propose. What do you think?

+ int (*unlink) (const char *pathname);
The unlink method is used nowhere. This could just be removed.

-static void
+void
print_tar_number(char *s, int len, uint64 val)
This could be an independent patch. Or not.

I think that I found another bug regarding the contents of the
segments. I did pg_basebackup -F t -X stream, then untared pg_xlog.tar
which contained segment 1/0/2, then:
$ pg_xlogdump 000000010000000000000002
pg_xlogdump: FATAL: could not find a valid record after 0/2000000
I'd expect this segment to have records, up to a XLOG_SWITCH record.

> As for using XLOGDIR to drive the name of the tarfile. pg_basebackup is
> already hardcoded to use pg_xlog. And so are the tests. We probably want to
> fix that, but that's a separate step and this patch will be easier to review
> and test if we keep it out for now.

Yes. pg_basebackup is not the only thing here missing the point here,
here is most of the list:
$ git grep "pg_xlog\"" -- *.c *.h
src/backend/access/transam/xlog.c:static const char *xlogSourceNames[]
= {"any", "archive", "pg_xlog", "stream"};
src/backend/replication/basebackup.c: dir = AllocateDir("pg_xlog");
src/backend/replication/basebackup.c: (errmsg("could
not open directory \"%s\": %m", "pg_xlog")));
src/backend/replication/basebackup.c: while ((de = ReadDir(dir,
"pg_xlog")) != NULL)
src/backend/replication/basebackup.c: if (strcmp(pathbuf,
"./pg_xlog") == 0)
src/backend/storage/file/fd.c: if (lstat("pg_xlog", &st) < 0)
src/backend/storage/file/fd.c: "pg_xlog")));
src/backend/storage/file/fd.c: if (pgwin32_is_junction("pg_xlog"))
src/backend/storage/file/fd.c: walkdir("pg_xlog", pre_sync_fname,
false, DEBUG1);
src/backend/storage/file/fd.c: walkdir("pg_xlog",
datadir_fsync_fname, false, LOG);
src/bin/initdb/initdb.c: snprintf(pg_xlog, MAXPGPATH, "%s/pg_xlog", pg_data);
src/bin/initdb/initdb.c: subdirloc = psprintf("%s/pg_xlog", pg_data);
src/bin/pg_basebackup/pg_basebackup.c: snprintf(param->xlogdir,
sizeof(param->xlogdir), "%s/pg_xlog", basedir);
src/bin/pg_basebackup/pg_basebackup.c: if
(!((pg_str_endswith(filename, "/pg_xlog") ||
src/bin/pg_basebackup/pg_basebackup.c: linkloc =
psprintf("%s/pg_xlog", basedir);
src/bin/pg_rewind/copy_fetch.c: strcmp(path, "pg_xlog") == 0)
src/bin/pg_rewind/filemap.c: if (strcmp(path, "pg_xlog") == 0 &&
type == FILE_TYPE_SYMLINK)
src/bin/pg_rewind/filemap.c: if (exists &&
!S_ISDIR(statbuf.st_mode) && strcmp(path, "pg_xlog") != 0)
src/bin/pg_rewind/filemap.c: if (strcmp(path, "pg_xlog") == 0 &&
type == FILE_TYPE_SYMLINK)
src/bin/pg_upgrade/exec.c: "pg_xlog"};
src/include/access/xlog_internal.h:#define XLOGDIR "pg_xlog"

Thanks,
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2016-09-05 08:09:32 Re: pgbench - allow to store select results into variables
Previous Message Tsunakawa, Takayuki 2016-09-05 08:00:06 Re: Supporting SJIS as a database encoding