Re: Add LZ4 compression in pg_dump

From: gkokolatos(at)pm(dot)me
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2023-01-19 16:42:06
Message-ID: 7zu6GYFi72jByVk2SEe_nd8r1DHhwA35vSntAE9sAfiMX9Lfk6bwimkwmu7GDgYaEvL27SE4qF1uK4_LHWQM4H-SXOwkMAI-GV8PKGmED_E=@pm.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

------- Original Message -------
On Thursday, January 19th, 2023 at 4:45 PM, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:

>
>
> On 1/18/23 20:05, gkokolatos(at)pm(dot)me wrote:
>
> > ------- Original Message -------
> > On Wednesday, January 18th, 2023 at 3:00 PM, Tomas Vondra tomas(dot)vondra(at)enterprisedb(dot)com wrote:
> >
> > > Hi,
> > >
> > > On 1/16/23 16:14, gkokolatos(at)pm(dot)me wrote:
> > >
> > > > Hi,
> > > >
> > > > I admit I am completely at lost as to what is expected from me anymore.
> >
> > <snip>
> >
> > > Unfortunately, this plays against this patch - I'm certainly in favor of
> > > adding lz4 (and other compression algos) into pg_dump, but if I commit
> > > 0001 we get little benefit, and the other parts actually adding lz4/zstd
> > > are treated as "WIP / for completeness" so it's unclear when we'd get to
> > > commit them.
> >
> > Thank you for your kindness and for taking the time to explain.
> >
> > > So if I could recommend one thing, it'd be to get at least one of those
> > > WIP patches into a shape that's likely committable right after 0001.
> >
> > This was clearly my fault. I misunderstood a suggestion upthread to focus
> > on the first patch of the series and ignore documentation and comments on
> > the rest.
> >
> > Please find v21 to contain 0002 and 0003 in a state which I no longer consider
> > as WIP but worthy of proper consideration. Some guidance on where is best to add
> > documentation in 0002 for the function pointers in CompressFileHandle will
> > be welcomed.
>
>
> This is internal-only API, not meant for use by regular users and/or
> extension authors, so I don't think we need sgml docs. I'd just add
> regular code-level documentation to compress_io.h.
>
> For inspiration see docs for "struct ReorderBuffer" in reorderbuffer.h,
> or "struct _archiveHandle" in pg_backup_archiver.h.
>
> Or what other kind of documentation you had in mind?

This is exactly what I was after. I was between compress_io.c and compress_io.h.
Thank you.

> > > > I had posted v19-0001 for a committer's consideration and v19-000{2,3} for completeness.
> > > > Please find a rebased v20 attached.
> > >
> > > I took a quick look at 0001, so a couple comments (sorry if some of this
> > > was already discussed in the thread):
> >
> > Much appreciated!
> >
> > > 1) I don't think a "refactoring" patch should reference particular
> > > compression algorithms (lz4/zstd), and in particular I don't think we
> > > should have "not yet implemented" messages. We only have a couple other
> > > places doing that, when we didn't have a better choice. But here we can
> > > simply reject the algorithm when parsing the options, we don't need to
> > > do that in a dozen other places.
> >
> > I have now removed lz4/zstd from where they were present with the exception
> > of pg_dump.c which is responsible for parsing.
>
>
> I'm not sure I understand why leave the lz4/zstd in this place?

You are right, it is not obvious. Those were added in 5e73a60488 which is
already committed in master and I didn't want to backtrack. Of course, I am
not opposing in doing so if you wish.

>
> > > 2) I wouldn't reorder the cases in WriteDataToArchive, i.e. I'd keep
> > > "none" at the end. It might make backpatches harder.
> >
> > Agreed. However a 'default' is needed in order to avoid compilation warnings.
> > Also note that 0002 completely does away with cases within WriteDataToArchive.
>
>
> OK, although that's also a consequence of using a "switch" instead of
> plan "if" branches.
>
> Furthermore, I'm not sure we really need the pg_fatal() about invalid
> compression method in these default blocks. I mean, how could we even
> get to these places when the build does not support the algorithm? All
> of this (ReadDataFromArchive, WriteDataToArchive, EndCompressor, ...)
> happens looong after the compressor was initialized and the method
> checked, no? So maybe either this should simply do Assert(false) or use
> a different error message.

I like Assert(false).

> > > 3) While building, I get bunch of warnings about missing cfdopen()
> > > prototype and pg_backup_archiver.c not knowing about cfdopen() and
> > > adding an implicit prototype (so I doubt it actually works).
> >
> > Fixed. cfdopen() got prematurely introduced in 5e73a6048 and then got removed
> > in 69fb29d1af. v20 failed to properly take 69fb29d1af in consideration. Note
> > that cfdopen is removed in 0002 which explains why cfbot didn't complain.
>
>
> OK.
>
> > > 4) "cfp" struct no longer wraps gzFile, but the comment was not updated.
> > > FWIW I'm not sure switching to "void *" is an improvement, maybe it'd be
> > > better to have a "union" of correct types?
> >
> > Please find and updated comment and a union in place of the void *. Also
> > note that 0002 completely does away with cfp in favour of a new struct
> > CompressFileHandle. I maintained the void * there because it is used by
> > private methods of the compressors. 0003 contains such an example with
> > LZ4CompressorState.
>
>
> I wonder if this (and also the previous item) makes sense to keep 0001
> and 0002 or to combine them. The "intermediate" state is a bit annoying.

Agreed. It was initially submitted as one patch. Then it was requested to be
split up in two parts, one to expand the use of the existing API and one to
replace with the new interface. Unfortunately the expansion of usage of the
existing API requires some tweaking, but that is not a very good reason for
the current patch set. I should have done a better job there.

Please find v22 attach which combines back 0001 and 0002. It is missing the
documentation that was discussed above as I wanted to give a quick feedback.
Let me know if you think that the combined version is the one to move forward
with.

Cheers,
//Georgios

>
> > > 5) cfopen/cfdopen are missing comments. cfopen_internal has an updated
> > > comment, but that's a static function while cfopen/cfdopen are the
> > > actual API.
> >
> > Added comments to cfopen/cfdopen.
>
>
> OK.
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

Attachment Content-Type Size
v22-0002-Add-LZ4-compression-to-pg_dump.patch text/x-patch 29.3 KB
v22-0001-Introduce-Compressor-API-in-pg_dump-and-use-it-t.patch text/x-patch 67.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Karl O. Pinc 2023-01-19 17:03:53 Re: Doc: Rework contrib appendix -- informative titles, tweaked sentences
Previous Message Robert Haas 2023-01-19 16:40:53 Re: almost-super-user problems that we haven't fixed yet