Re: Add LZ4 compression in pg_dump

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Georgios <gkokolatos(at)protonmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2022-06-26 15:55:32
Message-ID: 20220626155532.GA9311@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Will you be able to send a rebased patch for the next CF ?

If you update for the review comments I sent in March, I'll plan to do another
round of review.

On Sat, Mar 26, 2022 at 11:21:56AM -0500, Justin Pryzby wrote:
> LZ4F_HEADER_SIZE_MAX isn't defined in old LZ4.
>
> I ran into that on an ubuntu LTS, so I don't think it's so old that it
> shouldn't be handled more gracefully. LZ4 should either have an explicit
> version check, or else shouldn't depend on that feature (or should define a
> safe fallback version if the library header doesn't define it).
>
> https://packages.ubuntu.com/liblz4-1
>
> 0003: typo: of legacy => or legacy
>
> There are a large number of ifdefs being added here - it'd be nice to minimize
> that. basebackup was organized to use separate files, which is one way.
>
> $ git grep -c 'ifdef .*LZ4' src/bin/pg_dump/compress_io.c
> src/bin/pg_dump/compress_io.c:19
>
> In last year's CF entry, I had made a union within CompressorState. LZ4
> doesn't need z_streamp (and ztsd will need ZSTD_outBuffer, ZSTD_inBuffer,
> ZSTD_CStream).
>
> 0002: I wonder if you're able to re-use any of the basebackup parsing stuff
> from commit ffd53659c. You're passing both the compression method *and* level.
> I think there should be a structure which includes both. In the future, that
> can also handle additional options. I hope to re-use these same things for
> wal_compression=method:level.
>
> You renamed this:
>
> |- COMPR_ALG_LIBZ
> |-} CompressionAlgorithm;
> |+ COMPRESSION_GZIP,
> |+} CompressionMethod;
>
> ..But I don't think that's an improvement. If you were to change it, it should
> say something like PGDUMP_COMPRESS_ZLIB, since there are other compression
> structs and typedefs. zlib is not idential to gzip, which uses a different
> header, so in WriteDataToArchive(), LIBZ is correct, and GZIP is incorrect.
>
> The cf* changes in pg_backup_archiver could be split out into a separate
> commit. It's strictly a code simplification - not just preparation for more
> compression algorithms. The commit message should "See also:
> bf9aa490db24b2334b3595ee33653bf2fe39208c".
>
> The changes in 0002 for cfopen_write seem insufficient:
> |+ if (compressionMethod == COMPRESSION_NONE)
> |+ fp = cfopen(path, mode, compressionMethod, 0);
> | else
> | {
> | #ifdef HAVE_LIBZ
> | char *fname;
> |
> | fname = psprintf("%s.gz", path);
> |- fp = cfopen(fname, mode, compression);
> |+ fp = cfopen(fname, mode, compressionMethod, compressionLevel);
> | free_keep_errno(fname);
> | #else
>
> The only difference between the LIBZ and uncompressed case is the file
> extension, and it'll be the only difference with LZ4 too. So I suggest to
> first handle the file extension, and the rest of the code path is not
> conditional on the compression method. I don't think cfopen_write even needs
> HAVE_LIBZ - can't you handle that in cfopen_internal() ?
>
> This patch rejects -Z0, which ought to be accepted:
> ./src/bin/pg_dump/pg_dump -h /tmp regression -Fc -Z0 |wc
> pg_dump: error: can only specify -Z/--compress [LEVEL] when method is set
>
> Your 0003 patch shouldn't reference LZ4:
> +#ifndef HAVE_LIBLZ4
> + if (*compressionMethod == COMPRESSION_LZ4)
> + supports_compression = false;
> +#endif
>
> The 0004 patch renames zlibOutSize to outsize - I think the patch series should
> be constructed such as to minimize the size of the method-specific patches. I
> say this anticipating also adding support for zstd. The preliminary patches
> should have all the boring stuff. It would help for reviewing to keep the
> patches split up, or to enumerate all the boring things that are being renamed
> (like change OutputContext to cfp, rename zlibOutSize, ...).
>
> 0004: The include should use <lz4.h> and not "lz4.h"
>
> freebsd/cfbot is failing.
>
> I suggested off-list to add an 0099 patch to change LZ4 to the default, to
> exercise it more on CI.

On Sat, Mar 26, 2022 at 01:33:36PM -0500, Justin Pryzby wrote:
> On Sat, Mar 26, 2022 at 11:21:56AM -0500, Justin Pryzby wrote:
> > You're passing both the compression method *and* level. I think there should
> > be a structure which includes both. In the future, that can also handle
> > additional options.
>
> I'm not sure if there's anything worth saving, but I did that last year with
> 0003-Support-multiple-compression-algs-levels-opts.patch
> I sent a rebased copy off-list.
> https://www.postgresql.org/message-id/flat/20210104025321(dot)GA9712(at)telsasoft(dot)com#ca1b9f9d3552c87fa874731cad9d8391
>
> | fatal("not built with LZ4 support");
> | fatal("not built with lz4 support");
>
> Please use consistent capitalization of "lz4" - then the compiler can optimize
> away duplicate strings.
>
> > 0004: The include should use <lz4.h> and not "lz4.h"
>
> Also, use USE_LZ4 rather than HAVE_LIBLZ4, per 75eae0908.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2022-06-26 16:14:56 Re: doc: Clarify Savepoint Behavior
Previous Message Erik Rijkers 2022-06-26 15:44:22 JSON/SQL: jsonpath: incomprehensible error message