Re: Add LZ4 compression in pg_dump

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: gkokolatos(at)pm(dot)me, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: shiy(dot)fnst(at)fujitsu(dot)com, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2023-03-01 15:52:49
Message-ID: b0ced9ea-e92d-f9d1-7a2f-075881b755b2@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/1/23 14:39, gkokolatos(at)pm(dot)me wrote:
>
>
>
>
>
> ------- Original Message -------
> On Wednesday, March 1st, 2023 at 12:58 AM, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>
>
>
>> I found that e9960732a broke writing of empty gzip-compressed data,
>> specifically LOs. pg_dump succeeds, but then the restore fails:
>>
>> postgres=# SELECT lo_create(1234);
>> lo_create | 1234
>>
>> $ time ./src/bin/pg_dump/pg_dump -h /tmp -d postgres -Fc |./src/bin/pg_dump/pg_restore -f /dev/null -v
>> pg_restore: implied data-only restore
>> pg_restore: executing BLOB 1234
>> pg_restore: processing BLOBS
>> pg_restore: restoring large object with OID 1234
>> pg_restore: error: could not uncompress data: (null)
>>
>
> Thank you for looking. This was an untested case.
>

Yeah :-(

>> The inline patch below fixes it, but you won't be able to apply it
>> directly, as it's on top of other patches which rename the functions
>> back to "Zlib" and rearranges the functions to their original order, to
>> allow running:
>>
>> git diff --diff-algorithm=minimal -w e9960732a~:./src/bin/pg_dump/compress_io.c ./src/bin/pg_dump/compress_gzip.c
>>
>
> Please find a patch attached that can be applied directly.
>
>> The current function order avoids 3 lines of declarations, but it's
>> obviously pretty useful to be able to run that diff command. I already
>> argued for not calling the functions "Gzip" on the grounds that the name
>> was inaccurate.
>
> I have no idea why we are back on the naming issue. I stand by the name
> because in my humble opinion helps the code reader. There is a certain
> uniformity when the compression_spec.algorithm and the compressor
> functions match as the following code sample shows.
>
> if (compression_spec.algorithm == PG_COMPRESSION_NONE)
> InitCompressorNone(cs, compression_spec);
> else if (compression_spec.algorithm == PG_COMPRESSION_GZIP)
> InitCompressorGzip(cs, compression_spec);
> else if (compression_spec.algorithm == PG_COMPRESSION_LZ4)
> InitCompressorLZ4(cs, compression_spec);
>
> When the reader wants to see what happens when the PG_COMPRESSION_XXX
> is set, has to simply search for the XXX part. I think that this is
> justification enough for the use of the names.
>

I don't recall the previous discussion about the naming, but I'm not
sure why would it be inaccurate. We call it 'gzip' pretty much
everywhere, and I agree with Georgios there's it helps to make this
consistent with the PG_COMPRESSION_ stuff.

The one thing that concerned me while reviewing it earlier was that it
might make the backpatcheing harder. But that's mostly irrelevant due to
all the other changes I think.

>>
>> I'd want to create an empty large object in src/test/sql/largeobject.sql
>> to exercise this tested during pgupgrade. But unfortunately that
>> doesn't use -Fc, so this isn't hit. Empty input is an important enough
>> test case to justify a tap test, if there's no better way.
>
> Please find in the attached a test case that exercises this codepath.
>

Thanks. That seems correct to me, but I find it somewhat confusing,
because we now have

DeflateCompressorInit vs. InitCompressorGzip

DeflateCompressorEnd vs. EndCompressorGzip

DeflateCompressorData - The name doesn't really say what it does (would
be better to have a verb in there, I think).

I wonder if we can make this somehow clearer?

Also, InitCompressorGzip says this:

/*
* If the caller has defined a write function, prepare the necessary
* state. Avoid initializing during the first write call, because End
* may be called without ever writing any data.
*/
if (cs->writeF)
DeflateCompressorInit(cs);

Does it actually make sense to not have writeF defined in some cases?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikolay Samokhvalov 2023-03-01 15:56:47 Re: pg_upgrade and logical replication
Previous Message Peter Eisentraut 2023-03-01 15:52:18 Re: meson: Non-feature feature options