Re: Support for pg_receivexlog --format=plain|tar

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net>
Subject: Re: Support for pg_receivexlog --format=plain|tar
Date: 2017-01-15 05:44:33
Message-ID: CAB7nPqRLPj4e83wS2jUmhjWmqTQkZjys7H+RedsPu_D7UeQsrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 15, 2017 at 1:31 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>
>
> On Tue, Jan 10, 2017 at 3:01 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
> wrote:
>> Based on some analysis, it is enough to look at the last 4 bytes of
>> the compressed file to get the size output data with a single call to
>> lseek() and then read(). So as there is a simple way to do things and
>> that's far cheaper than decompressing perhaps hundred of segments I'd
>> rather do it this way. Attached is the implementation. This code is
>> using 2 booleans for 4 states of the file names: full non-compressed,
>> partial non-compressed, full compressed and partial compressed. This
>> keeps the last check of FindStreamingStart() more simple, but that's
>> quite fancy lately to have an enum for such things :D
>
>
> I think you need to document that analysis at least in a code comment. I
> assume this is in reference to the ISIZE member of the gzip format?

Good question. I am not sure on this one.

> I was going to say what happens if the file is corrupt and we get random
> junk data there, but as we only compare it to XlogSegSize that should be
> safe. But we might want to put a note in about that too?

Perhaps. I have made a better try in FindStreamingStart.

> Finally, I think we should make the error message clearly say "compressed
> segment file" - just to make things extra clear.

Sure.

>> > I found another problem with it -- it is completely broken in sync mode.
>> > You
>> > need to either forbid sync mode together with compression, or teach
>> > dir_sync() about it. The later would sound better, but I wonder how much
>> > that's going to kill compression due to the small blocks? Is it a
>> > reasonable
>> > use-case?
>>
>> Hm. Looking at the docs I think that --compress defined with
>> --synchronous maps to the use of Z_SYNC_FLUSH with gzflush(). FWIW I
>> don't have a direct use case for it, but it is not a major effort to
>> add it, so done. There is no actual reason to forbid this combinations
>> of options either.
>>
> It is enough to just gzflush(), don't we also need to still fsync()? I can't
> see any documentation that gzflush() does that, and a quick look at the code
> of zlib indicates it doesn't (but I didn't check in detail, so if you did
> please point me to it).

Hm. It looks that you are right. zlib goes down to _tr_flush_bits() to
flush some output, but this finishes only with put_byte(). As the fd
is opaque in gzFile, it would be just better to open() the file first,
and then use gzdopen to get the gzFile. Let's use as well the existing
fd field to save it. gzclose() closes as well the parent fd per the
documentation of zlib.
--
Michael

Attachment Content-Type Size
receivexlog-gzip-v5.patch application/stream 12.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-01-15 05:54:44 Re: Patch to implement pg_current_logfile() function
Previous Message Amit Kapila 2017-01-15 04:11:38 Re: Parallel Index Scans