Re: Support for pg_receivexlog --format=plain|tar

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net>
Subject: Re: Support for pg_receivexlog --format=plain|tar
Date: 2016-12-27 18:12:31
Message-ID: CABUevEwgrysESOSsJBy+wGAyoSxhyEn6mMFwXbfQv+=URMDd5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 27, 2016 at 1:16 PM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:

> On Tue, Dec 27, 2016 at 6:34 PM, Magnus Hagander <magnus(at)hagander(dot)net>
> wrote:
> > On Tue, Dec 27, 2016 at 2:23 AM, Michael Paquier <
> michael(dot)paquier(at)gmail(dot)com>
> > wrote:
> >> Magnus, you have mentioned me as well that you had a couple of ideas
> >> on the matter, feel free to jump in and let's mix our thoughts!
> >
> >
> > Yeah, I've been wondering what the actual usecase is here :)
>
> There is value to compress segments finishing with trailing zeros,
> even if they are not the species with the highest representation in
> the WAL archive.
>

Agreed on that part -- that's the value in compression though, and not
necessarily the TAR format itself.

Is there any value of the TAR format *without* compression in your scenario?

> > Though I was considering the case where all segments are streamed into
> the
> > same tarfile (and then some sort of configurable limit where we'd switch
> > tarfile after <n> segments, which rapidly started to feel too
> complicated).
> >
> > What's the actual advantage of having it wrapped inside a single tarfile?
>
> I am advocating for one tar file per segment to be honest. Grouping
> them makes the failure handling more complicated when connection to
> the server is killed, or the replication stream is cut. Well, not
> really complicated actually, because I think that you would need to
> drop in the segment folder a status file with enough information to
> let pg_receivexlog know from where in the tar file it needs to
> continue writing. If a new tarball is created for each segment,
> deciding from where to stream after a connection failure is just a
> matter of doing what is done today: having a look at the completed
> segments and begin streaming from the incomplete/absent one.
>

This pretty much matches up with the conclusion I got to myself as well. We
could create a new tarfile for each restart of pg_receivexlog, but then it
becomes unpredictable.

> >> There are a couple of things that I have been considering as well for
> >> pg_receivexlog. Though they are not directly stick to this thread,
> >> here they are as I don't forget about them:
> >> - Removal of oldest WAL segments on a partition. When writing WAL
> >> segments to a dedicated partition, we could have an option that
> >> automatically removes the oldest WAL segment if the partition is full.
> >> This triggers once a segment is completed.
> >> - Compression of fully-written segments. When a segment is finished
> >> being written, pg_receivexlog could compress them further with gz for
> >> example. With --format=t this leads to segnum.tar.gz being generated.
> >> The advantage of doing those two things in pg_receivexlog is
> >> monitoring. One process to handle them all, and there is no need of
> >> cron jobs to handle any cleanup or compression.
> >
> > I was at one point thinking that would be a good idea as well, but
> recently
> > I've more been thinking that what we should do is implement a
> > "--post-segment-command", which would act similar to archive_command but
> > started by pg_receivexlog. This could handle things like compression, and
> > also integration with external backup tools like backrest or barman in a
> > cleaner way. We could also spawn this without waiting for it to finish
> > immediately, which would allow parallellization of the process. When
> doing
> > the compression inline that rapidly becomes the bottleneck. Unlike a
> > basebackup you're only dealing with the need to buffer 16Mb on disk
> before
> > compressing it, so it should be fairly cheap.
>
> I did not consider the case of barman and backrest to be honest,
> having the view of 2ndQ folks and David would be nice here. Still, the
> main idea behind those done by pg_receivexlog's process would be to
> not spawn a new process. I have a class of users that care about
> things that could hang, they play a lot with network-mounted disks...
> And VMs of course.
>

I have been talking to David about it a couple of times, and he agreed that
it'd be useful to have a post-segment command. We haven't discussed it in
much detail though. I'll add him to direct-cc here to see if he has any
further input :)

It could be that the best idea is to just notify some other process of
what's happening. But making it an external command would give that a lot
of flexibility. Of course, we need to be careful not to put ourselves back
in the position we are in with archive_command, in that it's very difficult
to write a good one.

I'm sure everybody cares about things that could hang. But everything can
hang...

> > Another thing I've been considering in the same area would be to add the
> > ability to write the segments to a pipe instead of a directory. Then you
> > could just pipe it into gzip without the need to buffer on disk. This
> would
> > kill the ability to know at which point we'd sync()ed to disk, but in
> most
> > cases so will doing direct gzip. Just means we couldn't support this in
> sync
> > mode.
>
> Users piping their data don't care about reliability anyway. So that
> is not a problem.
>

Good point. Same would be true about people who gzip it, wouldn't it?

> > I can see the point of being able to compress the individual segments
> > directly in pg_receivexlog in smaller systems though, without the need to
> > rely on an external compression program as well. But in that case, is
> there
> > any reason we need to wrap it in a tarfile, and can't just write it to
> > <segment>.gz natively?
>
> You mean having a --compress=0|9 option that creates individual gz
> files for each segment? Definitely we could just do that. It would be
>

Yes, that's what I meant.

> a shame though to not use the WAL methods you have introduced in
> src/bin/pg_basebackup, with having the whole set tar and tar.gz. A
> quick hack in pg_receivexlog has showed me that segments are saved in
> a single tarball, which is not cool. My feeling is that using the
> existing infrastructure, but making it pluggable for individual files
> (in short I think that what is needed here is a way to tell the WAL
> method to switch to a new file when a segment completes) would really
> be the most simple one in terms of code lines and maintenance.
>

Much as I'd like to reuse that, I don't think that reusing that in itself
shold be the driver for how this should be decided. It should be the end
product.

To me it seems silly to create a directory full of tarfiles with a single
file in each. I don't particularly care about the fact that we added 512
bytes of wasted space to each, but we just created something that's
unnecessarily complicated for people to handle, didn't we? A plain
directory of .gz files is a lot easier to work with.

//Magnus

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2016-12-27 18:16:28 Hooks
Previous Message Fabien COELHO 2016-12-27 17:51:44 Re: BUG: pg_stat_statements query normalization issues with combined queries