Re: Teach pg_receivewal to use lz4 compression

From: gkokolatos(at)pm(dot)me
To: gkokolatos(at)pm(dot)me
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Magnus Hagander <magnus(at)hagander(dot)net>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Teach pg_receivewal to use lz4 compression
Date: 2021-07-08 14:18:40
Message-ID: PZnyf3PE6s8jQBf-CTQjAvjSRvw8dWWxYrLWR2NdqTrhm7wHfrQ2gUgwdlmGevfIxindJ3KooyuWQ5MRU9NTyMRiNMfRBkyVnqkyPH7nP_k=@pm.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hi,

please find v2 of the patch which tries to address the commends received so far.

Thank you all for your comments.

Michael Paquier wrote:

> Documentation is missing from the patch.
>
It has now been added.

> + LZ4F_compressionContext_t ctx;
> + size_t outbufCapacity;
> + void *outbuf;
> It may be cleaner to refer to lz4 in the name of those variables?

Agreed and done

> + ctx_out = LZ4F_createCompressionContext(&ctx, LZ4F_VERSION);
> + outbufCapacity = LZ4F_compressBound(LZ4_IN_SIZE, NULL /* default preferences */);
> Interesting. So this cannot be done at compilation time because of
> the auto-flush mode looking at the LZ4 code. That looks about right.

This is also my understanding.

> + system_or_bail('lz4', '-t', $lz4_wals[0]);
> I think that you should just drop this part of the test. The only
> part of LZ4 that we require to be present when Postgres is built with
> --with-lz4 is its library liblz4. Commands associated to it may not
> be around, causing this test to fail. The test checking that one .lz4
> file has been created is good to have. It may be worth adding a test
> with a .lz4.partial segment generated and --endpos pointing to a LSN
> that does not finish the segment that gets switched.

I humbly disagree with the need for the test. It is rather easily possible
to generate a file that can not be decoded, thus becoming useless. Having the
test will provide some guarantee for the fact. In the current patch, there
is code to find out if the program lz4 is available in the system. If it is
not available, then that specific test is skipped. The rest remains as it
were. Also `system_or_bail` is not used anymore in favour of the `system_log`
so that the test counted and the execution of tests continues upon failure.

> It seems to me that you are missing some logic in
> FindStreamingStart() to handle LZ4-compressed segments, in relation
> with IsCompressXLogFileName() and IsPartialCompressXLogFileName().

Very correct. The logic is now added. Given the lz4 api, one has to fill
in the uncompressed size during creation time. Then one can read the
headers and verify the expectations.

> Should we have more tests for ZLIB, while on it? That seems like a
> good addition as long as we can skip the tests conditionally when
> that's not supported.

Agreed. Please allow me to provide a distinct patch for this code.

Dilip Kumar wrote:

> Wouldn't it be better to call it compression method instead of
> compression program?

Agreed. This is inline with the suggestions of other reviewers.
Find the change in the attached patch.

> I think we can somehow use "acceleration" parameter of lz4 compression
> to map on compression level, It is not direct mapping but
> can't we create some internal mapping instead of completely ignoring
> this option for lz4, or we can provide another option for lz4?

We can, though I am not in favour of doing so. There is seemingly
little benefit for added complexity.

> Should we also support LZ4 compression using dictionary?

I would we should not do that. If my understanding is correct,
decompression would require the dictionary to be passed along.
The algorithm seems to be very competitive to the current compression
as is.

Magnus Hagander wrote:

> I came here to say exactly that, just had to think up what I thought
> was the better name first. Either method or algorithm, but method
> seems like the much simpler choice and therefore better in this case.
>
> Should is also then not be --compression-method, rather than --compress-method?

Agreed and changed throughout.

Michael Paquier wrote:

> What I think is important for the user when it comes to this
> option is the consistency of its naming across all the tools
> supporting it. pg_dump and pg_basebackup, where we could plug LZ4,
> already use most of the short options you could use for pg_receivewal,
> having only a long one gives a bit more flexibility.

Done.

Cheers,
//Georgios

Attachment Content-Type Size
v2-0001-Teach-pg_receivewal-to-use-lz4-compression.patch application/octet-stream 29.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2021-07-08 14:46:51 Re: pgbench logging broken by time logic changes
Previous Message Bruce Momjian 2021-07-08 13:58:35 Re: SHA-1 FIPS - compliance