Re: Add LZ4 compression in pg_dump

From: gkokolatos(at)pm(dot)me
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, shiy(dot)fnst(at)fujitsu(dot)com, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2023-03-01 13:39:14
Message-ID: lsZgBfZRB5w5slcXnKwoL9qgpzdlAC_UwYHXfbj1oioWsNkckwr2BcLvvCi9-x7eV261s8AP5OHwROZ-QYtpeixxt73DQ_d6-LNfghYOaIQ=@pm.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

------- Original Message -------
On Wednesday, March 1st, 2023 at 12:58 AM, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:

> I found that e9960732a broke writing of empty gzip-compressed data,
> specifically LOs. pg_dump succeeds, but then the restore fails:
>
> postgres=# SELECT lo_create(1234);
> lo_create | 1234
>
> $ time ./src/bin/pg_dump/pg_dump -h /tmp -d postgres -Fc |./src/bin/pg_dump/pg_restore -f /dev/null -v
> pg_restore: implied data-only restore
> pg_restore: executing BLOB 1234
> pg_restore: processing BLOBS
> pg_restore: restoring large object with OID 1234
> pg_restore: error: could not uncompress data: (null)
>

Thank you for looking. This was an untested case.

> The inline patch below fixes it, but you won't be able to apply it
> directly, as it's on top of other patches which rename the functions
> back to "Zlib" and rearranges the functions to their original order, to
> allow running:
>
> git diff --diff-algorithm=minimal -w e9960732a~:./src/bin/pg_dump/compress_io.c ./src/bin/pg_dump/compress_gzip.c
>

Please find a patch attached that can be applied directly.

> The current function order avoids 3 lines of declarations, but it's
> obviously pretty useful to be able to run that diff command. I already
> argued for not calling the functions "Gzip" on the grounds that the name
> was inaccurate.

I have no idea why we are back on the naming issue. I stand by the name
because in my humble opinion helps the code reader. There is a certain
uniformity when the compression_spec.algorithm and the compressor
functions match as the following code sample shows.

if (compression_spec.algorithm == PG_COMPRESSION_NONE)
InitCompressorNone(cs, compression_spec);
else if (compression_spec.algorithm == PG_COMPRESSION_GZIP)
InitCompressorGzip(cs, compression_spec);
else if (compression_spec.algorithm == PG_COMPRESSION_LZ4)
InitCompressorLZ4(cs, compression_spec);

When the reader wants to see what happens when the PG_COMPRESSION_XXX
is set, has to simply search for the XXX part. I think that this is
justification enough for the use of the names.

>
> I'd want to create an empty large object in src/test/sql/largeobject.sql
> to exercise this tested during pgupgrade. But unfortunately that
> doesn't use -Fc, so this isn't hit. Empty input is an important enough
> test case to justify a tap test, if there's no better way.

Please find in the attached a test case that exercises this codepath.

Cheers,
//Georgios

Attachment Content-Type Size
0001-Properly-gzip-compress-when-no-data-is-available.patch text/x-patch 7.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeroen Vermeulen 2023-03-01 14:23:45 Re: libpq: PQgetCopyData() and allocation overhead
Previous Message Önder Kalacı 2023-03-01 13:21:52 Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher