Re: Add LZ4 compression in pg_dump

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: gkokolatos(at)pm(dot)me
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, shiy(dot)fnst(at)fujitsu(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2023-05-08 00:31:06
Message-ID: ZFhCyn4Gm2eu60rB@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 05, 2023 at 02:13:28PM +0000, gkokolatos(at)pm(dot)me wrote:
> Good point. I thought about it before submitting the patch. I
> concluded that given the complexity and operations involved in
> LZ4Stream_read_internal() and the rest of t he pg_dump/pg_restore
> code, the memset() call will be negligible. However from the
> readability point of view, the function is a bit cleaner with the
> memset().
>
> I will not object to any suggestion though, as this is a very
> trivial point. Please find attached a v2 of the patch following the
> suggested approach.

Hmm. I was looking at this patch, and what you are trying to do
sounds rather right to keep a parallel with the gzip and zstd code
paths.

Looking at the code of gzread.c, gzgets() enforces a null-termination
on the string read. Still, isn't that something we'd better enforce
in read_none() as well? compress_io.h lists this as a requirement of
the callback, and Zstd_gets() does so already. read_none() does not
enforce that, unfortunately.

+ /* No work needs to be done for a zero-sized output buffer */
+ if (size <= 0)
+ return 0;

Indeed. This should be OK.

- ret = LZ4Stream_read_internal(state, ptr, size, true);
+ Assert(size > 1);

The addition of this assertion is a bit surprising, and this is
inconsistent with Zstd_gets where a length of 1 is authorized. We
should be more consistent across all the callbacks, IMO, not less, so
as we apply the same API contract across all the compression methods.

While testing this patch, I have triggered an error pointing out that
the decompression path of LZ4 is broken for table data. I can
reproduce that with a dump of the regression database, as of:
make installcheck
pg_dump --format=d --file=dump_lz4 --compress=lz4 regression
createdb regress_lz4
pg_restore --format=d -d regress_lz4 dump_lz4
pg_restore: error: COPY failed for table "clstr_tst": ERROR: extra data after last expected column
CONTEXT: COPY clstr_tst, line 15: "32 6 seis xyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzy..."
pg_restore: warning: errors ignored on restore: 1

This does not show up with gzip or zstd, and the patch does not
influence the result. In short it shows up with and without the
patch, on HEAD. That does not look really stable :/
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonathan S. Katz 2023-05-08 00:48:25 Re: 2023-05-11 release announcement draft
Previous Message Peter Smith 2023-05-08 00:29:50 Re: PGDOCS - Replica Identity quotes