Re: Add LZ4 compression in pg_dump

From: gkokolatos(at)pm(dot)me
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, shiy(dot)fnst(at)fujitsu(dot)com, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2023-03-31 09:19:47
Message-ID: f4qyEM_TZClMk3_dC2Joc7KELxdu0j-otmT-AHJy25vpCkkXp5aGuZ08TwabgHTieiIqYwgtnBKe7muIjHIwHKn4ZimuKc62rualr8FJ9gg=@pm.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

------- Original Message -------
On Wednesday, March 29th, 2023 at 12:02 AM, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:

>
>
> On 3/28/23 18:07, gkokolatos(at)pm(dot)me wrote:
>
> > ------- Original Message -------
> > On Friday, March 24th, 2023 at 10:30 AM, gkokolatos(at)pm(dot)me gkokolatos(at)pm(dot)me wrote:
> >
> > > ------- Original Message -------
> > > On Thursday, March 23rd, 2023 at 6:10 PM, Tomas Vondra tomas(dot)vondra(at)enterprisedb(dot)com wrote:
> > >
> > > > This leaves the empty-data issue (which we have a fix for) and the
> > > > switch to LZ4F. And then the zstd part.
> > >
> > > Please expect promptly a patch for the switch to frames.
> >
> > Please find the expected patch attached. Note that the bulk of the
> > patch is code unification, variable renaming to something more
> > appropriate, and comment addition. These are changes that are not
> > strictly necessary to switch to LZ4F. I do believe that are
> > essential for code hygiene after the switch and they do belong
> > on the same commit.
>
>
> I think the patch is fine, but I'm wondering if the renames shouldn't go
> a bit further. It removes references to LZ4File struct, but there's a
> bunch of functions with LZ4File_ prefix. Why not to simply use LZ4_
> prefix? We don't have GzipFile either.
>
> Sure, it might be a bit confusing because lz4.h uses LZ4_ prefix, but
> then we probably should not define LZ4_compressor_init ...

This is a good point. The initial thought was that since lz4.h is now
removed, such ambiguity will not be present. In v2 of the patch the
function is renamed to `LZ4State_compression_init` since this name
describes better its purpose. It initializes the LZ4State for
compression.

As for the LZ4File_ prefix, I have no objections. Please find the
prefix changed to LZ4Stream_. For the record, the word 'File' is not
unique to the lz4 implementation. The common data structure used by
the API in compress_io.h:

typedef struct CompressFileHandle CompressFileHandle;

The public functions for this API are named:

InitCompressFileHandle
InitDiscoverCompressFileHandle
EndCompressFileHandle

And within InitCompressFileHandle the pattern is:

if (compression_spec.algorithm == PG_COMPRESSION_NONE)
InitCompressFileHandleNone(CFH, compression_spec);
else if (compression_spec.algorithm == PG_COMPRESSION_GZIP)
InitCompressFileHandleGzip(CFH, compression_spec);
else if (compression_spec.algorithm == PG_COMPRESSION_LZ4)
InitCompressFileHandleLZ4(CFH, compression_spec);

It was felt that a prefix was required due to the inclusion 'lz4.h'
header where naming functions as 'LZ4_' would be wrong. The prefix
'LZ4File_' seemed to be in line with the naming of the rest of
the relevant functions and structures. Other compressions, gzip and
none, did not face the same issue.

To conclude, I think that having a prefix is slightly preferred
over not having one. Since the prefix `LZ4File_` is not desired,
I propose `LZ4Stream_` in v2.

I will not object to dismissing the argument and drop `File` from
the prefix, if so requested.

>
> Also, maybe the comments shouldn't use "File API" when compress_io.c
> calls that "Compressed stream API".

Done.

Cheers,
//Georgios

>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

Attachment Content-Type Size
v2-0001-Use-LZ4-frames-in-pg_dump-s-compressor-API.patch text/x-patch 18.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-03-31 09:26:27 Re: PGdoc: add ID attribute to create_publication.sgml
Previous Message Daniel Gustafsson 2023-03-31 09:14:20 Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert