Re: pg_basebackup stream xlog to tar

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup stream xlog to tar
Date: 2016-09-29 10:44:48
Message-ID: CABUevExJSYiR5DEA7KCwwwwfcC8MWfBq8bNHkOZNisi5nDs6Pw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 5, 2016 at 10:01 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:

> On Sat, Sep 3, 2016 at 10:35 PM, Magnus Hagander <magnus(at)hagander(dot)net>
> wrote:
> > Ugh. That would be nice to have, but I think that's outside the scope of
> > this patch.
>
> A test for this patch that could have value would be to use
> pg_basebackup -X stream -Ft, then untar pg_xlog.tar and look at the
> size of the segments. If you have an idea to untar something without
> the in-core perl support because we need to have the TAP stuff able to
> run on at least 5.8.8, I'd welcome an idea. Honestly I still have
> none, and that's why the recovery tests do not use tarballs in their
> tests when using pg_basebackup. In short let's not add something more
> for this patch.
>
> > PFA is an updated version of this patch that:
> > * documents a magic value passed to zlib (which is in their
> documentation as
> > being a magic value, but has no define)
> > * fixes the padding of tar files
> > * adds a most basic test that the -X stream -Ft does produce a tarfile
>
> So I had a more serious look at this patch, and it basically makes
> more generic the operations done for the plain mode by adding a set of
> routines that can be used by both tar and plain mode to work on the
> WAL files streamed. Elegant.
>
> + <para>
> + The transaction log files will be written to
> + the <filename>base.tar</filename> file.
> + </para>
> Nit: number of spaces here.
>

Fixed.

> -mark_file_as_archived(const char *basedir, const char *fname)
> +mark_file_as_archived(StreamCtl *stream, const char *fname)
> Just passing WalMethod as argument would be enough, but... My patch
> adding the fsync calls to pg_basebackup could just make use of
> StreamCtl, so let's keep it as you suggest.
>

Yeah, I think it's cleaner to pass the whole structure around really. If
not now, we'd need it eventually. That makes all callers more consistent.

> static bool
> existsTimeLineHistoryFile(StreamCtl *stream)
> {
> [...]
> + return stream->walmethod->existsfile(histfname);
> }
> existsfile returns always false for the tar method. This does not
> matter much because pg_basebackup exists immediately in case of a
> failure, but I think that this deserves a comment in ReceiveXlogStream
> where existsTimeLineHistoryFile is called.
>

OK, added. As you say, the behaviour is expected, but it makes sense to
mention it clealry there.

> I find the use of existsfile() in open_walfile() rather confusing
> because this relies on the fact that existsfile() returns always
> false for the tar mode. We could add an additional field in WalMethod
> to store the method type and use that more, but that may make the code
> more confusing than what you propose. What do you think?
>

Yeah, I'm not sure that helps. The point is that the abstraction is
supposed to take care of that. But if it's confusing, then clearly a
comment is warranted there, so I've added that. Do you think that makes it
clear enough?

>
> + int (*unlink) (const char *pathname);
> The unlink method is used nowhere. This could just be removed.
>

That's clearly a missed cleanup. Removed, thanks.

> -static void
> +void
> print_tar_number(char *s, int len, uint64 val)
> This could be an independent patch. Or not.
>

Could be, but we don't really have any other uses for it.

> I think that I found another bug regarding the contents of the
> segments. I did pg_basebackup -F t -X stream, then untared pg_xlog.tar
> which contained segment 1/0/2, then:
> $ pg_xlogdump 000000010000000000000002
> pg_xlogdump: FATAL: could not find a valid record after 0/2000000
> I'd expect this segment to have records, up to a XLOG_SWITCH record.
>

Ugh. That's definitely broken yes. It seeked back and overwrote the tar
header with the data, instead of starting where the file part was supposed
to be. It worked fine on compressed files, and it's when implementing that
that it broke.

So what's our basic rule for these perl tests - are we allowed to use
pg_xlogdump from within a pg_basebackup test? If so that could actually be
a useful test - do the backup, extract the xlog and verify that it contains
the SWITCH record.

I also noticed that using -Z5 created a .tar.gz and -z created a .tar
(which was compressed). Because compresslevel is set to -1 with -z,
meaning default.

Again, apologies for getting late back into the game here.

//Magnus

Attachment Content-Type Size
pg_basebackup_stream_tar_v3.patch text/x-patch 44.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2016-09-29 10:52:40 Re: Tuplesort merge pre-reading
Previous Message Jeevan Chalke 2016-09-29 10:18:09 Re: Add support for restrictive RLS policies