Re: Teach pg_receivewal to use lz4 compression

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: gkokolatos(at)pm(dot)me
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Teach pg_receivewal to use lz4 compression
Date: 2021-11-01 08:09:08
Message-ID: YX+gpFH2s9MG5X3D@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 29, 2021 at 08:38:33PM +0900, Michael Paquier wrote:
> Why would the header size change between the moment the segment is
> begun and it is finished? We could store it in memory and write it
> again when the segment is closed instead, even if it means to fseek()
> back to the beginning of the file once the segment is completed.
> Storing WalSegSz from the moment a segment is opened makes the code
> weaker to SIGINTs and the kind, so this does not fix the problem I
> mentioned previously :/

I got to think more on this one, and another argument against storing
an incorrect contentSize while the segment is not completed would
break the case of partial segments with --synchronous, where we should
still be able to recover as much data flushed as possible. Like zlib,
where one has to complete the partial segment with zeros after
decompression until the WAL segment size is reached, we should be able
to support that with LZ4. (I have saved some customer data in the
past thanks to this property, btw.)

It is proves to be too fancy to rewrite the header with a correct
contentSize once the segment is completed, another way would be to
enforce a decompression of each segment in-memory. The advantage of
this method is that we would be a maximum portable. For example, if
one begins to use pg_receivewal on an archive directory where we used
an archive_command, we would be able to grab the starting LSN. That's
more costly of course, but the LZ4 protocol does not make that easy
either with its chunk protocol. By the way, you are right that we
should worry about the variability in size of the header as we only
have the guarantee that it can be within a give window. I missed
that and lz4frame.h mentions that around LZ4F_headerSize :/

It would be good to test with many segments, but could we think about
just relying on LZ4F_decompress() with a frame and compute the
decompressed size by ourselves? At least that will never break, and
that would work in all the cases aimed by pg_receivewal.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message gkokolatos 2021-11-01 08:39:59 Re: Teach pg_receivewal to use lz4 compression
Previous Message Peter Eisentraut 2021-11-01 06:28:16 Re: Some RELKIND macro refactoring