Re: Add LZ4 compression in pg_dump

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: gkokolatos(at)pm(dot)me, Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2023-03-13 20:21:11
Message-ID: 3da84c90-7e12-75ee-a582-025bc68dd098@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/11/23 11:50, gkokolatos(at)pm(dot)me wrote:
> ------- Original Message -------
> On Saturday, March 11th, 2023 at 7:00 AM, Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
>
>> Hello,
>> 23.02.2023 23:24, Tomas Vondra wrote:
>>
>>> On 2/23/23 16:26, Tomas Vondra wrote:
>>>
>>>> Thanks for v30 with the updated commit messages. I've pushed 0001 after
>>>> fixing a comment typo and removing (I think) an unnecessary change in an
>>>> error message.
>>>>
>>>> I'll give the buildfarm a bit of time before pushing 0002 and 0003.
>>>
>>> I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.),
>>> and marked the CF entry as committed. Thanks for the patch!
>>>
>>> I wonder how difficult would it be to add the zstd compression, so that
>>> we don't have the annoying "unsupported" cases.
>>
>>
>> With the patch 0003 committed, a single warning -Wtype-limits appeared in the
>> master branch:
>> $ CPPFLAGS="-Og -Wtype-limits" ./configure --with-lz4 -q && make -s -j8
>> compress_lz4.c: In function ‘LZ4File_gets’:
>> compress_lz4.c:492:19: warning: comparison of unsigned expression in ‘< 0’ is
>> always false [-Wtype-limits]
>> 492 | if (dsize < 0)
>> |
>
> Thank you Alexander. Please find attached an attempt to address it.
>
>> (I wonder, is it accidental that there no other places that triggers
>> the warning, or some buildfarm animals had this check enabled before?)
>
> I can not answer about the buildfarms. Do you think that adding an explicit
> check for this warning in meson would help? I am a bit uncertain as I think
> that type-limits are included in extra.
>
> @@ -1748,6 +1748,7 @@ common_warning_flags = [
> '-Wshadow=compatible-local',
> # This was included in -Wall/-Wformat in older GCC versions
> '-Wformat-security',
> + '-Wtype-limits',
> ]
>
>>
>> It is not a false positive as can be proved by the 002_pg_dump.pl modified as
>> follows:
>> - program => $ENV{'LZ4'},
>>
>> + program => 'mv',
>>
>> args => [
>>
>> - '-z', '-f', '--rm',
>> "$tempdir/compression_lz4_dir/blobs.toc",
>> "$tempdir/compression_lz4_dir/blobs.toc.lz4",
>> ],
>> },
>
> Correct, it is not a false positive. The existing testing framework provides
> limited support for exercising error branches. Especially so when those are
> dependent on generated output.
>
>> A diagnostic logging added shows:
>> LZ4File_gets() after LZ4File_read_internal; dsize: 18446744073709551615
>>
>> and pg_restore fails with:
>> error: invalid line in large object TOC file
>> ".../src/bin/pg_dump/tmp_check/tmp_test_22ri/compression_lz4_dir/blobs.toc": "????"
>
> It is a good thing that the restore fails with bad input. Yet it should
> have failed earlier. The attached makes certain it does fail earlier.
>

Thanks for the patch.

I did look if there are other places that might have the same issue, and
I think there are - see attached 0002. For example LZ4File_write is
declared to return size_t, but then it also does

if (LZ4F_isError(status))
{
fs->errcode = status;
return -1;
}

That won't work :-(

And these issues may not be restricted to lz4 code - Gzip_write is
declared to return size_t, but it does

return gzwrite(gzfp, ptr, size);

and gzwrite returns int. Although, maybe that's correct, because
gzwrite() is "0 on error" so maybe this is fine ...

However, Gzip_read assigns gzread() to size_t, and that does not seem
great. It probably will still trigger the following pg_fatal() because
it'd be very lucky to match the expected 'size', but it's confusing.

I wonder whether CompressorState should use int or size_t for the
read_func/write_func callbacks. I guess no option is perfect, i.e. no
data type will work for all compression libraries we might use (lz4 uses
int while lz4f uses size_t, to there's that).

It's a bit weird the "open" functions return int and the read/write
size_t. Maybe we should stick to int, which is what the old functions
(cfwrite etc.) did.

But I think the actual problem here is that the API does not clearly
define how errors are communicated. I mean, it's nice to return the
value returned by the library function without "mangling" it by
conversion to size_t, but what if the libraries communicate errors in
different way? Some may return "0" while others may return "-1".

I think the right approach is to handle all library errors and not just
let them through. So Gzip_write() needs to check the return value, and
either call pg_fatal() or translate it to an error defined by the API.

For example we might say "returns 0 on error" and then translate all
library-specific errors to that.

While looking at the code I realized a couple function comments don't
say what's returned in case of error, etc. So 0004 adds those.

0003 is a couple minor assorted comments/questions:

- Should we move ZLIB_OUT_SIZE/ZLIB_IN_SIZE to compress_gzip.c?

- Why are LZ4 buffer sizes different (ZLIB has both 4kB)?

- I wonder if we actually need LZ4F_HEADER_SIZE_MAX? Is it even possible
for LZ4F_compressBound to return value this small (especially for 16kB
input buffer)?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
0004-comment-improvements.patch text/x-patch 2.9 KB
0003-questions.patch text/x-patch 1.5 KB
0002-more-size_t-places.patch text/x-patch 1.8 KB
0001-Respect-return-type-of-LZ4File_read_internal.patch text/x-patch 1.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-03-13 20:22:29 Re: Transparent column encryption
Previous Message Peter Eisentraut 2023-03-13 20:21:04 Re: Transparent column encryption