Re: Add LZ4 compression in pg_dump

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: gkokolatos(at)pm(dot)me, shiy(dot)fnst(at)fujitsu(dot)com, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2023-03-13 21:47:12
Message-ID: c1493de3-7d97-c3c9-dd39-5a781e10a7b8@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Justin,

Thanks for the patch.

On 3/8/23 02:45, Justin Pryzby wrote:
> On Wed, Mar 01, 2023 at 04:52:49PM +0100, Tomas Vondra wrote:
>> Thanks. That seems correct to me, but I find it somewhat confusing,
>> because we now have
>>
>> DeflateCompressorInit vs. InitCompressorGzip
>>
>> DeflateCompressorEnd vs. EndCompressorGzip
>>
>> DeflateCompressorData - The name doesn't really say what it does (would
>> be better to have a verb in there, I think).
>>
>> I wonder if we can make this somehow clearer?
>
> To move things along, I updated Georgios' patch:
>
> Rename DeflateCompressorData() to DeflateCompressorCommon();

Hmmm, I don't find "common" any clearer than "data" :-( There needs to
at least be a comment explaining what "common" does.

> Rearrange functions to their original order allowing a cleaner diff to the prior code;

OK. I wasn't very enthusiastic about this initially, but after thinking
about it a bit I think it's meaningful to make diffs clearer. But I
don't see much difference with/without the patch. The

git diff --diff-algorithm=minimal -w
e9960732a~:src/bin/pg_dump/compress_io.c src/bin/pg_dump/compress_gzip.c

Produces ~25k diff with/without the patch. What am I doing wrong?

> Change pg_fatal() to an assertion+comment;

Yeah, that's reasonable. I'd even ditch the assert/comment, TBH. We
could add such protections against "impossible" stuff to a zillion other
places and the confusion likely outweighs the benefits.

> Update the commit message and fix a few typos;
>

Thanks. I don't want to annoy you too much, but could you split the
patch into the "empty-data" fix and all the other changes (rearranging
functions etc.)? I'd rather not mix those in the same commit.

>> Also, InitCompressorGzip says this:
>>
>> /*
>> * If the caller has defined a write function, prepare the necessary
>> * state. Avoid initializing during the first write call, because End
>> * may be called without ever writing any data.
>> */
>> if (cs->writeF)
>> DeflateCompressorInit(cs);
>>
>> Does it actually make sense to not have writeF defined in some cases?
>
> InitCompressor is being called for either reading or writing, either of
> which could be null:
>
> src/bin/pg_dump/pg_backup_custom.c: ctx->cs = AllocateCompressor(AH->compression_spec,
> src/bin/pg_dump/pg_backup_custom.c- NULL,
> src/bin/pg_dump/pg_backup_custom.c- _CustomWriteFunc);
> --
> src/bin/pg_dump/pg_backup_custom.c: cs = AllocateCompressor(AH->compression_spec,
> src/bin/pg_dump/pg_backup_custom.c- _CustomReadFunc, NULL);
>
> It's confusing that the comment says "Avoid initializing...". What it
> really means is "Initialize eagerly...". But that makes more sense in
> the context of the commit message for this bugfix than in a comment. So
> I changed that too.
>
> + /* If deflation was initialized, finalize it */
> + if (cs->private_data)
> + DeflateCompressorEnd(AH, cs);
>
> Maybe it'd be more clear if this used "if (cs->writeF)", like in the
> init function ?
>

Yeah, if the two checks are equivalent, it'd be better to stick to the
same check everywhere.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Regina Obe 2023-03-13 21:57:57 RE: Ability to reference other extensions by schema in extension scripts
Previous Message Andres Freund 2023-03-13 21:45:29 Re: meson: Non-feature feature options