Re: Add LZ4 compression in pg_dump

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: gkokolatos(at)pm(dot)me, Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, shiy(dot)fnst(at)fujitsu(dot)com, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2023-05-05 13:23:37
Message-ID: 71868915-0345-1b7b-f3d6-ec66547c662a@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 2023-05-05 Fr 06:02, gkokolatos(at)pm(dot)me wrote:
>
>
>
>
> ------- Original Message -------
> On Friday, May 5th, 2023 at 8:00 AM, Alexander Lakhin<exclusion(at)gmail(dot)com> wrote:
>
>
>>
>> 23.03.2023 20:10, Tomas Vondra wrote:
>>
>>> So pushed all three parts, after updating the commit messages a bit.
>>>
>>> This leaves the empty-data issue (which we have a fix for) and the
>>> switch to LZ4F. And then the zstd part.
>>
>> I'm sorry that I haven't noticed/checked that before, but when trying to
>> perform check-world with Valgrind I've discovered another issue presumably
>> related to LZ4File_gets().
>> When running under Valgrind:
>> PROVE_TESTS=t/002_pg_dump.pl make check -C src/bin/pg_dump/
>> I get:
>> ...
>> 07:07:11.683 ok 1939 - compression_lz4_dir: glob check for
>> .../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir/*.dat.lz4
>> # Running: pg_restore --jobs=2 --file=.../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir.sql
>> .../src/bin/pg_dump/tmp_check/tmp_test_HB6A/compression_lz4_dir
>>
>> ==00:00:00:00.579 2811926== Conditional jump or move depends on uninitialised value(s)
>> ==00:00:00:00.579 2811926== at 0x4853376: rawmemchr (vg_replace_strmem.c:1548)
>> ==00:00:00:00.579 2811926== by 0x4C96A67: _IO_str_init_static_internal (strops.c:41)
>> ==00:00:00:00.579 2811926== by 0x4C693A2: _IO_strfile_read (strfile.h:95)
>> ==00:00:00:00.579 2811926== by 0x4C693A2: __isoc99_sscanf (isoc99_sscanf.c:28)
>> ==00:00:00:00.579 2811926== by 0x11DB6F: _LoadLOs (pg_backup_directory.c:458)
>> ==00:00:00:00.579 2811926== by 0x11DD1E: _PrintTocData (pg_backup_directory.c:422)
>> ==00:00:00:00.579 2811926== by 0x118484: restore_toc_entry (pg_backup_archiver.c:882)
>> ==00:00:00:00.579 2811926== by 0x1190CC: RestoreArchive (pg_backup_archiver.c:699)
>> ==00:00:00:00.579 2811926== by 0x10F25D: main (pg_restore.c:414)
>> ==00:00:00:00.579 2811926==
>> ...
>>
>> It looks like the line variable returned by gets_func() here is not
>> null-terminated:
>> while ((CFH->gets_func(line, MAXPGPATH, CFH)) != NULL)
>>
>> {
>> ...
>> if (sscanf(line, "%u %" CppAsString2(MAXPGPATH) "s\n", &oid, lofname) != 2)
>> ...
>> And Valgrind doesn't like it.
>>
> Valgrind is correct to not like it. LZ4Stream_gets() got modeled after
> gets() when it should have been modeled after fgets().
>
> Please find a patch attached to address it.
>
>

Isn't using memset here a bit wasteful? Why not just put a null at the
end after calling LZ4Stream_read_internal(), which tells you how many
bytes it has written?

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-05-05 13:36:41 MERGE lacks ruleutils.c decompiling support!?
Previous Message Robert Sjöblom 2023-05-05 13:17:31 [DOC] Update ALTER SUBSCRIPTION documentation