Re: Teach pg_receivewal to use lz4 compression

From: gkokolatos(at)pm(dot)me
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Teach pg_receivewal to use lz4 compression
Date: 2021-11-04 09:53:32
Message-ID: 2zl2ZOBxSvzkN4pYrpA4de75NzHPf1ypoHEAvf-x8JxaEQVCXL4CjFVuSaM61jnnVtraPTjdBHlytKD8vRAXDeWgw0-4d58r2yqRPgKd65c=@pm.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Thursday, November 4th, 2021 at 9:21 AM, Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Thu, Nov 04, 2021 at 04:31:48PM +0900, Michael Paquier wrote:
> Thanks. I have looked at 0001 today, and applied it after fixing a
> couple of issues.

Great! Thank you very much.

> From memory:
> - zlib.h was missing from pg_receivewal.c, issue that I noticed after
> removing the redefinition of Z_DEFAULT_COMPRESSION because there was
> no need for it (did a run with a --without-zlib as well).

Yeah, I simply wanted to avoid adding a header. Either way works really.

> - Simplified a bit the error handling for incorrect option
> combinations, using a switch/case while on it.

Much cleaner done this way.

> - Renamed the existing variable "compression" in walmethods.c to
> compression_level, to reduce any confusion with the introduction of
> compression_method. One thing I have noticed is about the tar method,
> where we rely on the compression level to decide if compression should
> be used or not. There should be some simplifications possible there
> but there is a huge take in receivelog.c where we use COMPRESSION_NONE
> to track down that we still want to zero a new segment when using tar
> method.

Agreed.

> - Use of 'I' as short option name, err... After applying the first
> batch..

I left that in just to have the two compression related options next to each
other when switching. I assumed it might help with readability for the next
developer looking at it.

Removing it, is cleaner for the option definifion though, thanks.

>
> Based on the work of 0001, there were some conflicts with 0002. I
> have solved them while reviewing it, and adapted the code to what got
> already applied.

Thank you very much.

>
> + header_size = LZ4F_compressBegin(ctx, lz4buf, lz4bufsize, NULL);
> + if (LZ4F_isError(header_size))
> + {
> + pg_free(lz4buf);
> + close(fd);
> + return NULL;
> + }
> In dir_open_for_write(), I guess that this one is missing one
> LZ4F_freeCompressionContext().

Agreed.

>
> + status = LZ4F_freeDecompressionContext(ctx);
> + if (LZ4F_isError(status))
> + {
> + pg_log_error("could not free LZ4 decompression context: %s",
> + LZ4F_getErrorName(status));
> + exit(1);
> + }
> +
> + if (uncompressed_size != WalSegSz)
> + {
> + pg_log_warning("compressed segment file \"%s\" has
> incorrect uncompressed size %ld, skipping",
> + dirent->d_name, uncompressed_size);
> + (void) LZ4F_freeDecompressionContext(ctx);
> + continue;
> + }
> When the uncompressed size does not match out expected size, the
> second LZ4F_freeDecompressionContext() looks unnecessary to me because
> we have already one a couple of lines above.

Agreed.

>
> + ctx_out = LZ4F_createCompressionContext(&ctx, LZ4F_VERSION);
> + lz4bufsize = LZ4F_compressBound(LZ4_IN_SIZE, NULL);
> + if (LZ4F_isError(ctx_out))
> + {
> + close(fd);
> + return NULL;
> + }
> LZ4F_compressBound() can be after the check on ctx_out, here.
>
> + while (1)
> + {
> + char *readp;
> + char *readend;
> Simply looping when decompressing a segment to check its size looks
> rather unsafe to me. We should leave the loop once uncompressed_size
> is strictly more than WalSegSz.

The loop exits when done reading or when it failed to read:

+ r = read(fd, readbuf, sizeof(readbuf));
+ if (r < 0)
+ {
+ pg_log_error("could not read file \"%s\": %m", fullpath);
+ exit(1);
+ }
+
+ /* Done reading */
+ if (r == 0)
+ break;

Although I do agree that it can exit before that, if the uncompressed size is
greater than WalSegSz.

>
> The amount of TAP tests looks fine, and that's consistent with what we
> do for zlib^D^D^D^Dgzip. Now, when testing manually pg_receivewal
> with various combinations of gzip, lz4 and none, I can see the
> following failure in the code that calculates the streaming start
> point:
> pg_receivewal: error: could not decompress file
> "wals//000000010000000000000006.lz4": ERROR_frameType_unknown
>

Hmmm.... I will look into that.

> In the LZ4 code, this points to lib/lz4frame.c, where we read an
> incorrect header (see the part that does not match LZ4F_MAGICNUMBER).
> The segments written by pg_receivewal are clean. Please note that
> this shows up as well when manually compressing some segments with a
> simple lz4 command, to simulate for example the case where a user
> compressed some segments by himself/herself before running
> pg_receivewal.
>

Rights, thank you for investigating. I will look further.

> So, tour code does LZ4F_createDecompressionContext() followed by a
> loop on read() and LZ4F_decompress() that relies on an input and an
> output buffer of a fixed 4kB size (we could use 64kB at least here
> actually?). So this set of loops looks rather correct to me.
>

For what is worth, in a stand alone program I wrote while investigating, I did
not notice any noteworthy performance gain, when decompressing files of original
size similar to common WalSegSz values, using 4kB, 8kB, 16kB and 32kB buffers.
I did not try 64kB though. This was by no means exhaustive performance testing,
though good enough to propose a value. I chose 4kB because it is small enough to
have in the stack. I thought anything bigger should be heap alloced and that
would add a bit more distraction in the code with the pg_free() calls.

I will re-write to use 64kB in the heap.

> Now, this part is weird:
> + while (readp < readend)
> + {
> + size_t read_size = 1;
> + size_t out_size = 1;
>
> I would have expected read_size to be (readend - readp) to match with
> the remaining data in the read buffer that we still need to read.
> Shouldn't out_size also be LZ4_CHUNK_SZ to match with the size of the
> output buffer where all the contents are read? By setting it to 1, I
> think that this is doing more loops into LZ4F_decompress() than really
> necessary.

You are very correct. An oversight when moving code over from my program and
renaming variables. Consider me embarrassed.

> It would not hurt either to memset(0) those buffers before
> they are used, IMO.

It does not hurt, yet I do not think that is necessary because one buffer is
throw away, i.e. the program writes to it but we never read it, and the other is
overwritten during the read call.

> I am not completely sure either, but should we
> use the number of bytes returned by LZ4F_decompress() as a hint for
> the follow-up loops?

It is possible, though in my humble opinion it adds some code and has no
measurable effect in the code.

> > + status = LZ4F_decompress(ctx, outbuf, &out_size,
> > + readbuf, &read_size, NULL);

> And... It happens that the error from v9 is here, where we need to
> read the amount of remaining data from "readp", and not "readbuf" :)

Agreed.

I really suck at renaming all the things... I am really embarrassed.

>
> Attached is an updated patch, which includes fixes for most of the
> issues I am mentioning above. Please note that I have not changed
> FindStreamingStart(), so this part is the same as v9.

Thanks!

Cheers,
//Georgios

>
> Thanks,
> --
> Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2021-11-04 09:56:54 Re: Columns correlation and adaptive query optimization
Previous Message Daniel Gustafsson 2021-11-04 09:36:37 Re: [patch] [doc] Further note required activity aspect of automatic checkpoint and archving