Re: Add LZ4 compression in pg_dump

From: gkokolatos(at)pm(dot)me
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, shiy(dot)fnst(at)fujitsu(dot)com, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2023-05-05 14:13:28
Message-ID: P-Dl7-QRVPmjv35H2n-Wnj9amDG_MUEb0-jtlzs05n5z03lq0IZNTsI5h4DjTmLekfmKexgFBQhmVXtUO402vAIFx0e002lIzsb8O8Hi1wA=@pm.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

------- Original Message -------
On Friday, May 5th, 2023 at 3:23 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:

> On 2023-05-05 Fr 06:02, gkokolatos(at)pm(dot)me wrote:
>
>> ------- Original Message -------
>> On Friday, May 5th, 2023 at 8:00 AM, Alexander Lakhin
>> [<exclusion(at)gmail(dot)com>](mailto:exclusion(at)gmail(dot)com)
>> wrote:
>>
>>> 23.03.2023 20:10, Tomas Vondra wrote:
>>>
>>>> So pushed all three parts, after updating the commit messages a bit.
>>>>
>>>> This leaves the empty-data issue (which we have a fix for) and the
>>>> switch to LZ4F. And then the zstd part.
>>>
>>> I'm sorry that I haven't noticed/checked that before, but when trying to
>>> perform check-world with Valgrind I've discovered another issue presumably
>>> related to LZ4File_gets().
>>> When running under Valgrind:
>>> PROVE_TESTS=t/002_pg_dump.pl make check -C src/bin/pg_dump/
>>> I get:
>>> ...
>>> 07:07:11.683 ok 1939 - compression_lz4_dir: glob check for
>>> .../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir/*.dat.lz4
>>> # Running: pg_restore --jobs=2 --file=.../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir.sql
>>> .../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir
>>>
>>> ==00:00:00:00.579 2811926== Conditional jump or move depends on uninitialised value(s)
>>> ==00:00:00:00.579 2811926== at 0x4853376: rawmemchr (vg_replace_strmem.c:1548)
>>> ==00:00:00:00.579 2811926== by 0x4C96A67: _IO_str_init_static_internal (strops.c:41)
>>> ==00:00:00:00.579 2811926== by 0x4C693A2: _IO_strfile_read (strfile.h:95)
>>> ==00:00:00:00.579 2811926== by 0x4C693A2: __isoc99_sscanf (isoc99_sscanf.c:28)
>>> ==00:00:00:00.579 2811926== by 0x11DB6F: _LoadLOs (pg_backup_directory.c:458)
>>> ==00:00:00:00.579 2811926== by 0x11DD1E: _PrintTocData (pg_backup_directory.c:422)
>>> ==00:00:00:00.579 2811926== by 0x118484: restore_toc_entry (pg_backup_archiver.c:882)
>>> ==00:00:00:00.579 2811926== by 0x1190CC: RestoreArchive (pg_backup_archiver.c:699)
>>> ==00:00:00:00.579 2811926== by 0x10F25D: main (pg_restore.c:414)
>>> ==00:00:00:00.579 2811926==
>>> ...
>>>
>>> It looks like the line variable returned by gets_func() here is not
>>> null-terminated:
>>> while ((CFH->gets_func(line, MAXPGPATH, CFH)) != NULL)
>>>
>>> {
>>> ...
>>> if (sscanf(line, "%u %" CppAsString2(MAXPGPATH) "s\n", &oid, lofname) != 2)
>>> ...
>>> And Valgrind doesn't like it.
>>
>> Valgrind is correct to not like it. LZ4Stream_gets() got modeled after
>> gets() when it should have been modeled after fgets().
>>
>> Please find a patch attached to address it.
>
> Isn't using memset here a bit wasteful? Why not just put a null at the end after calling LZ4Stream_read_internal(), which tells you how many bytes it has written?

Good point. I thought about it before submitting the patch. I concluded that given the complexity and operations involved in LZ4Stream_read_internal() and the rest of the pg_dump/pg_restore code, the memset() call will be negligible. However from the readability point of view, the function is a bit cleaner with the memset().

I will not object to any suggestion though, as this is a very trivial point. Please find attached a v2 of the patch following the suggested approach.

Cheers,

//Georgios

> cheers
>
> andrew
>
> --
> Andrew Dunstan
> EDB:
> https://www.enterprisedb.com

Attachment Content-Type Size
v2-0001-Null-terminate-the-output-buffer-of-LZ4Stream_get.patch text/x-patch 1.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2023-05-05 14:23:31 Re: Add two missing tests in 035_standby_logical_decoding.pl
Previous Message Padmavathi G 2023-05-05 13:56:56 Re: Tables getting stuck at 's' state during logical replication