Re: Add LZ4 compression in pg_dump

From: gkokolatos(at)pm(dot)me
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Rachel Heaton <rachelmheaton(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2022-07-05 13:22:47
Message-ID: u7dLpRH_6r_rGyqNWHSDx2Wz5g1virAU3-f4XMC2BwiR81Tegqm4Sxtte0YAe77je9KVMa4jVxDNSt3XkYLYpyOoabCid1Bm7BZxwoitcmY=@pm.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

------- Original Message -------
On Sunday, June 26th, 2022 at 5:55 PM, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:

>
> Hi,
>
> Will you be able to send a rebased patch for the next CF ?

Please find a rebased and heavily refactored patchset. Since parts of this
patchset were already committed, I restarted numbering. I am not certain if
this is the preferred way. This makes alignment with previous comments a bit
harder

> If you update for the review comments I sent in March, I'll plan to do another
> round of review.

I have updated for "some" of the comments. This is not an unwillingness to
incorporate those specific comments. Simply this patchset had started to divert
heavily already based on comments from Mr. Paquier who had already requested for
the APIs to be refactored to use function pointers. This is happening in 0002 of
the patchset. 0001 of the patchset is using the new compression.h under common.

This patchset should be considered a late draft, as commentary, documentation,
and some finer details are not yet finalized; because I am expecting the proposed
refactor to receive a wealth of comments. It would be helpful to understand if
the proposed direction is something worth to be worked upon, before moving to the
finer details.

For what is worth, I am the sole author of the current patchset.

Cheers,
//Georgios

> On Sat, Mar 26, 2022 at 11:21:56AM -0500, Justin Pryzby wrote:
>
> > LZ4F_HEADER_SIZE_MAX isn't defined in old LZ4.
> >
> > I ran into that on an ubuntu LTS, so I don't think it's so old that it
> > shouldn't be handled more gracefully. LZ4 should either have an explicit
> > version check, or else shouldn't depend on that feature (or should define a
> > safe fallback version if the library header doesn't define it).
> >
> > https://packages.ubuntu.com/liblz4-1
> >
> > 0003: typo: of legacy => or legacy
> >
> > There are a large number of ifdefs being added here - it'd be nice to minimize
> > that. basebackup was organized to use separate files, which is one way.
> >
> > $ git grep -c 'ifdef .*LZ4' src/bin/pg_dump/compress_io.c
> > src/bin/pg_dump/compress_io.c:19
> >
> > In last year's CF entry, I had made a union within CompressorState. LZ4
> > doesn't need z_streamp (and ztsd will need ZSTD_outBuffer, ZSTD_inBuffer,
> > ZSTD_CStream).
> >
> > 0002: I wonder if you're able to re-use any of the basebackup parsing stuff
> > from commit ffd53659c. You're passing both the compression method and level.
> > I think there should be a structure which includes both. In the future, that
> > can also handle additional options. I hope to re-use these same things for
> > wal_compression=method:level.
> >
> > You renamed this:
> >
> > |- COMPR_ALG_LIBZ
> > |-} CompressionAlgorithm;
> > |+ COMPRESSION_GZIP,
> > |+} CompressionMethod;
> >
> > ..But I don't think that's an improvement. If you were to change it, it should
> > say something like PGDUMP_COMPRESS_ZLIB, since there are other compression
> > structs and typedefs. zlib is not idential to gzip, which uses a different
> > header, so in WriteDataToArchive(), LIBZ is correct, and GZIP is incorrect.
> >
> > The cf* changes in pg_backup_archiver could be split out into a separate
> > commit. It's strictly a code simplification - not just preparation for more
> > compression algorithms. The commit message should "See also:
> > bf9aa490db24b2334b3595ee33653bf2fe39208c".
> >
> > The changes in 0002 for cfopen_write seem insufficient:
> > |+ if (compressionMethod == COMPRESSION_NONE)
> > |+ fp = cfopen(path, mode, compressionMethod, 0);
> > | else
> > | {
> > | #ifdef HAVE_LIBZ
> > | char *fname;
> > |
> > | fname = psprintf("%s.gz", path);
> > |- fp = cfopen(fname, mode, compression);
> > |+ fp = cfopen(fname, mode, compressionMethod, compressionLevel);
> > | free_keep_errno(fname);
> > | #else
> >
> > The only difference between the LIBZ and uncompressed case is the file
> > extension, and it'll be the only difference with LZ4 too. So I suggest to
> > first handle the file extension, and the rest of the code path is not
> > conditional on the compression method. I don't think cfopen_write even needs
> > HAVE_LIBZ - can't you handle that in cfopen_internal() ?
> >
> > This patch rejects -Z0, which ought to be accepted:
> > ./src/bin/pg_dump/pg_dump -h /tmp regression -Fc -Z0 |wc
> > pg_dump: error: can only specify -Z/--compress [LEVEL] when method is set
> >
> > Your 0003 patch shouldn't reference LZ4:
> > +#ifndef HAVE_LIBLZ4
> > + if (*compressionMethod == COMPRESSION_LZ4)
> > + supports_compression = false;
> > +#endif
> >
> > The 0004 patch renames zlibOutSize to outsize - I think the patch series should
> > be constructed such as to minimize the size of the method-specific patches. I
> > say this anticipating also adding support for zstd. The preliminary patches
> > should have all the boring stuff. It would help for reviewing to keep the
> > patches split up, or to enumerate all the boring things that are being renamed
> > (like change OutputContext to cfp, rename zlibOutSize, ...).
> >
> > 0004: The include should use <lz4.h> and not "lz4.h"
> >
> > freebsd/cfbot is failing.
> >
> > I suggested off-list to add an 0099 patch to change LZ4 to the default, to
> > exercise it more on CI.
>
>
> On Sat, Mar 26, 2022 at 01:33:36PM -0500, Justin Pryzby wrote:
>
> > On Sat, Mar 26, 2022 at 11:21:56AM -0500, Justin Pryzby wrote:
> >
> > > You're passing both the compression method and level. I think there should
> > > be a structure which includes both. In the future, that can also handle
> > > additional options.
> >
> > I'm not sure if there's anything worth saving, but I did that last year with
> > 0003-Support-multiple-compression-algs-levels-opts.patch
> > I sent a rebased copy off-list.
> > https://www.postgresql.org/message-id/flat/20210104025321(dot)GA9712(at)telsasoft(dot)com#ca1b9f9d3552c87fa874731cad9d8391
> >
> > | fatal("not built with LZ4 support");
> > | fatal("not built with lz4 support");
> >
> > Please use consistent capitalization of "lz4" - then the compiler can optimize
> > away duplicate strings.
> >
> > > 0004: The include should use <lz4.h> and not "lz4.h"
> >
> > Also, use USE_LZ4 rather than HAVE_LIBLZ4, per 75eae0908.
>
>

Attachment Content-Type Size
v7-0002-Introduce-Compressor-API-in-pg_dump.patch text/x-patch 49.7 KB
v7-0001-Prepare-pg_dump-for-additional-compression-method.patch text/x-patch 50.0 KB
v7-0003-Add-LZ4-compression-in-pg_-dump-restore.patch text/x-patch 31.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Antonin Houska 2022-07-05 13:24:25 Re: Temporary file access API
Previous Message Bharath Rupireddy 2022-07-05 12:42:14 Re: Support load balancing in libpq