Re: Add LZ4 compression in pg_dump

From: gkokolatos(at)pm(dot)me
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2022-06-28 08:14:25
Message-ID: UsTnh6pNlF-Kichjme-H98KcAIycxLqV_BNBh4OvGLinGadL2KlsRlW2LHzB8qDbABTrLJ6CSTtENifm7mXOinRUXftSwZ3_rWj7vsPYVAc=@pm.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

------- Original Message -------
On Sunday, June 26th, 2022 at 5:55 PM, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:

>
>
> Hi,
>
> Will you be able to send a rebased patch for the next CF ?

Thank you for taking an interest in the PR. The plan is indeed to sent
a new version.

> If you update for the review comments I sent in March, I'll plan to do another
> round of review.

Thank you.

>
> On Sat, Mar 26, 2022 at 11:21:56AM -0500, Justin Pryzby wrote:
>
> > LZ4F_HEADER_SIZE_MAX isn't defined in old LZ4.
> >
> > I ran into that on an ubuntu LTS, so I don't think it's so old that it
> > shouldn't be handled more gracefully. LZ4 should either have an explicit
> > version check, or else shouldn't depend on that feature (or should define a
> > safe fallback version if the library header doesn't define it).
> >
> > https://packages.ubuntu.com/liblz4-1
> >
> > 0003: typo: of legacy => or legacy
> >
> > There are a large number of ifdefs being added here - it'd be nice to minimize
> > that. basebackup was organized to use separate files, which is one way.
> >
> > $ git grep -c 'ifdef .*LZ4' src/bin/pg_dump/compress_io.c
> > src/bin/pg_dump/compress_io.c:19
> >
> > In last year's CF entry, I had made a union within CompressorState. LZ4
> > doesn't need z_streamp (and ztsd will need ZSTD_outBuffer, ZSTD_inBuffer,
> > ZSTD_CStream).
> >
> > 0002: I wonder if you're able to re-use any of the basebackup parsing stuff
> > from commit ffd53659c. You're passing both the compression method and level.
> > I think there should be a structure which includes both. In the future, that
> > can also handle additional options. I hope to re-use these same things for
> > wal_compression=method:level.
> >
> > You renamed this:
> >
> > |- COMPR_ALG_LIBZ
> > |-} CompressionAlgorithm;
> > |+ COMPRESSION_GZIP,
> > |+} CompressionMethod;
> >
> > ..But I don't think that's an improvement. If you were to change it, it should
> > say something like PGDUMP_COMPRESS_ZLIB, since there are other compression
> > structs and typedefs. zlib is not idential to gzip, which uses a different
> > header, so in WriteDataToArchive(), LIBZ is correct, and GZIP is incorrect.
> >
> > The cf* changes in pg_backup_archiver could be split out into a separate
> > commit. It's strictly a code simplification - not just preparation for more
> > compression algorithms. The commit message should "See also:
> > bf9aa490db24b2334b3595ee33653bf2fe39208c".
> >
> > The changes in 0002 for cfopen_write seem insufficient:
> > |+ if (compressionMethod == COMPRESSION_NONE)
> > |+ fp = cfopen(path, mode, compressionMethod, 0);
> > | else
> > | {
> > | #ifdef HAVE_LIBZ
> > | char *fname;
> > |
> > | fname = psprintf("%s.gz", path);
> > |- fp = cfopen(fname, mode, compression);
> > |+ fp = cfopen(fname, mode, compressionMethod, compressionLevel);
> > | free_keep_errno(fname);
> > | #else
> >
> > The only difference between the LIBZ and uncompressed case is the file
> > extension, and it'll be the only difference with LZ4 too. So I suggest to
> > first handle the file extension, and the rest of the code path is not
> > conditional on the compression method. I don't think cfopen_write even needs
> > HAVE_LIBZ - can't you handle that in cfopen_internal() ?
> >
> > This patch rejects -Z0, which ought to be accepted:
> > ./src/bin/pg_dump/pg_dump -h /tmp regression -Fc -Z0 |wc
> > pg_dump: error: can only specify -Z/--compress [LEVEL] when method is set
> >
> > Your 0003 patch shouldn't reference LZ4:
> > +#ifndef HAVE_LIBLZ4
> > + if (*compressionMethod == COMPRESSION_LZ4)
> > + supports_compression = false;
> > +#endif
> >
> > The 0004 patch renames zlibOutSize to outsize - I think the patch series should
> > be constructed such as to minimize the size of the method-specific patches. I
> > say this anticipating also adding support for zstd. The preliminary patches
> > should have all the boring stuff. It would help for reviewing to keep the
> > patches split up, or to enumerate all the boring things that are being renamed
> > (like change OutputContext to cfp, rename zlibOutSize, ...).
> >
> > 0004: The include should use <lz4.h> and not "lz4.h"
> >
> > freebsd/cfbot is failing.
> >
> > I suggested off-list to add an 0099 patch to change LZ4 to the default, to
> > exercise it more on CI.
>
>
> On Sat, Mar 26, 2022 at 01:33:36PM -0500, Justin Pryzby wrote:
>
> > On Sat, Mar 26, 2022 at 11:21:56AM -0500, Justin Pryzby wrote:
> >
> > > You're passing both the compression method and level. I think there should
> > > be a structure which includes both. In the future, that can also handle
> > > additional options.
> >
> > I'm not sure if there's anything worth saving, but I did that last year with
> > 0003-Support-multiple-compression-algs-levels-opts.patch
> > I sent a rebased copy off-list.
> > https://www.postgresql.org/message-id/flat/20210104025321(dot)GA9712(at)telsasoft(dot)com#ca1b9f9d3552c87fa874731cad9d8391
> >
> > | fatal("not built with LZ4 support");
> > | fatal("not built with lz4 support");
> >
> > Please use consistent capitalization of "lz4" - then the compiler can optimize
> > away duplicate strings.
> >
> > > 0004: The include should use <lz4.h> and not "lz4.h"
> >
> > Also, use USE_LZ4 rather than HAVE_LIBLZ4, per 75eae0908.
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2022-06-28 08:32:30 Separate the attribute physical order from logical order
Previous Message Michael Paquier 2022-06-28 07:54:24 Re: Logging query parmeters in auto_explain