Re: Teach pg_receivewal to use lz4 compression

From: gkokolatos(at)pm(dot)me
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Teach pg_receivewal to use lz4 compression
Date: 2021-07-09 07:47:47
Message-ID: m9tNOULZ1-ZZwdCVNSfWe1HkASTFeobtXkMeewSY--McJMAe1lIx4nmVSiUh8xU1Ano7lwwdCJTLdjPWxqoRVNep325VH8sLCvjHkH9RvOM=@pm.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Friday, July 9th, 2021 at 04:49, Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Thu, Jul 08, 2021 at 02:18:40PM +0000, gkokolatos(at)pm(dot)me wrote:
>
> > please find v2 of the patch which tries to address the commends
> >
> > received so far.
>
> Thanks!
>
> > Michael Paquier wrote:
> >
> > > - system_or_bail('lz4', '-t', $lz4_wals[0]);
> > >
> > > I think that you should just drop this part of the test. The only
> > >
> > > part of LZ4 that we require to be present when Postgres is built with
> > >
> > > --with-lz4 is its library liblz4. Commands associated to it may not
> > >
> > > be around, causing this test to fail. The test checking that one .lz4
> > >
> > > file has been created is good to have. It may be worth adding a test
> > >
> > > with a .lz4.partial segment generated and --endpos pointing to a LSN
> > >
> > > that does not finish the segment that gets switched.
> >
> > I humbly disagree with the need for the test. It is rather easily possible
> >
> > to generate a file that can not be decoded, thus becoming useless. Having the
> >
> > test will provide some guarantee for the fact. In the current patch, there
> >
> > is code to find out if the program lz4 is available in the system. If it is
> >
> > not available, then that specific test is skipped. The rest remains as it
> >
> > were. Also `system_or_bail` is not used anymore in favour of the `system_log`
> >
> > so that the test counted and the execution of tests continues upon failure.
>
> Check. I can see what you are using in the new patch. We could live
>
> with that.

Great. Thank you.

>
> > > It seems to me that you are missing some logic in
> > >
> > > FindStreamingStart() to handle LZ4-compressed segments, in relation
> > >
> > > with IsCompressXLogFileName() and IsPartialCompressXLogFileName().
> >
> > Very correct. The logic is now added. Given the lz4 api, one has to fill
> >
> > in the uncompressed size during creation time. Then one can read the
> >
> > headers and verify the expectations.
>
> Yeah, I read that as well with lz4 --list and the kind. That's weird
>
> compared to how ZLIB gives an easy access to it. We may want to do an
>
> effort and tell about more lz4 --content-size/--list, telling that we
>
> add the size in the segment compressed because we have to and LZ4 does
>
> not do it by default?

I am afraid I do not follow. In the patch we do add the uncompressed size
and then, the uncompressed size is checked against a known value WalSegSz.
What the compressed size will be checked against?

>
> > > Should we have more tests for ZLIB, while on it? That seems like a
> > >
> > > good addition as long as we can skip the tests conditionally when
> > >
> > > that's not supported.
> >
> > Agreed. Please allow me to provide a distinct patch for this code.
>
> Thanks. Looking forward to seeing it. That may be better on a
>
> separate thread, even if I digressed in this thread :)

Thank you for the comments. I will sent in a separate thread.

>
> > > I think we can somehow use "acceleration" parameter of lz4 compression
> > >
> > > to map on compression level, It is not direct mapping but
> > >
> > > can't we create some internal mapping instead of completely ignoring
> > >
> > > this option for lz4, or we can provide another option for lz4?
> >
> > We can, though I am not in favour of doing so. There is seemingly
> >
> > little benefit for added complexity.
>
> Agreed.
>
> > > What I think is important for the user when it comes to this
> > >
> > > option is the consistency of its naming across all the tools
> > >
> > > supporting it. pg_dump and pg_basebackup, where we could plug LZ4,
> > >
> > > already use most of the short options you could use for pg_receivewal,
> > >
> > > having only a long one gives a bit more flexibility.
> >
> > Done.
>
> * http://www.zlib.org/rfc-gzip.html.
>
> - - For lz4 compressed segments
>
> */
>
> This comment is incomplete.

It is. I will fix.

>
> +#define IsLZ4CompressXLogFileName(fname) \
> - (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4") && \
> - strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN && \
> - strcmp((fname) + XLOG_FNAME_LEN, ".lz4") == 0)
>
> +#define IsLZ4PartialCompressXLogFileName(fname) \
> - (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4.partial") && \
> - strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN && \
> - strcmp((fname) + XLOG_FNAME_LEN, ".lz4.partial") == 0)
>
> This is getting complicated. Would it be better to change this stuff
>
> and switch to a routine that checks if a segment has a valid name, is
>
> partial, and the type of compression that applied to it? It seems to
>
> me that we should group iszlibcompress and islz4compress together with
>
> the options available through compression_method.

I agree with you. I will refactor.

> - if (compresslevel != 0)
> - {
> - if (compression_method == COMPRESSION_NONE)
>
>
> - {
>
>
> - compression_method = COMPRESSION_ZLIB;
>
>
> - }
>
>
> - if (compression_method != COMPRESSION_ZLIB)
>
>
> - {
>
>
> - pg_log_error("cannot use --compress when "
>
>
> - "--compression-method is not gzip");
>
>
> - fprintf(stderr, _("Try \\"%s --help\\" for more information.\\n"),
>
>
> - progname);
>
>
> - exit(1);
>
>
> - }
>
>
> - }
>
> For compatibility where --compress enforces the use of zlib that would
>
> work, but this needs a comment explaining the goal of this block. I
>
> am wondering if it would be better to break the will and just complain
>
> when specifying --compress without --compression-method though. That
>
> would cause compatibility issues, but this would make folks aware of
>
> the presence of LZ4, which does not sound bad to me either as ZLIB is
>
> slower than LZ4 on all fronts.

I would vote to break the compatibility if that is an option. I chose the
less invasive approach thinking that breaking the compatibility would not
be an option.

Unless others object, I will include --compress as a complimentary option
to --compression-method in updated version of the patch.

> - else if (compression_method == COMPRESSION_ZLIB)
> - {
> - pg_log_error("cannot use --compression-method gzip when "
>
>
> - "--compression is 0");
>
>
> - fprintf(stderr, _("Try \\"%s --help\\" for more information.\\n"),
>
>
> - progname);
>
>
> - exit(1);
>
>
> - }
>
> Hmm. It would be more natural to enforce a default compression level
>
> in this case? The user is asking for a compression with zlib here.

You are correct, in the current patch passing --compression-method=gzip alone
is equivalent to passing --compression=0 in the current master version. This
behaviour may be confusing for the user. What should the default compression
be then? I am inclined to say '5' as a compromise between speed and compression
ration.

> - my $lz4 = $ENV{LZ4};
>
> [...]
> - Verify that the stored file is readable if program lz4 is available
> ===================================================================
>
> - skip "program lz4 is not found in your system", 1
> - if (!defined $lz4 || $lz4 eq '');
>
>
>
> Okay, this is acceptable. Didn't know the existing trick with TAR
>
> either.

Thank you.

>
> - /*
>
>
> - * XXX: this is crap... lz4preferences now does show the uncompressed
>
>
> - * size via lz4 --list <filename> but the compression goes down the
>
>
> - * window... also it is not very helpfull to have it at the startm, does
>
>
> - * it?
>
>
> - */
>
>
>
> What do you mean here by "the compression goes out the window"?

Please consider me adequately embarrassed. This was a personal comment while I was
working on the code. It is not correct and it should have never seen the public
light.

Cheers,
//Georgios

> ---------------------------------------------------------------
>
> Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2021-07-09 07:50:28 Re: Incorrect usage of strtol, atoi for non-numeric junk inputs
Previous Message Peter Smith 2021-07-09 07:36:04 psql tab auto-complete for CREATE PUBLICATION