Decompression bug in astreamer_lz4

From: Mikhail Gribkov <youzhick(at)gmail(dot)com>
To: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Decompression bug in astreamer_lz4
Date: 2025-06-25 20:07:32
Message-ID: CAMEv5_uQS1Hg6KCaEP2JkrTBbZ-nXQhxomWrhYQvbdzR-zy-wA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

While developing my extension, I faced some problems in using
pg_basebackup/pg_verifybackup on a cluster with additional custom files
saved. The extension saves its own format files, which may have a variety
of sizes and usually quite low compression potential.

The same situation can be reproduced using randomly filled files. Let’s
imagine the extension saves its own statistics in pg_stat:
dd if=/dev/urandom of=$PGDATA/pg_stat/junk bs=1M count=16

Now we’ll run the server and make an lz4-packed backup:
$PGINSTALL/bin/pg_basebackup --no-sync -cfast --target
server:/tmp/server-backup -Xfetch --compress server-lz4

Now we’ll check the backup:
$PGINSTALL/bin/pg_verifybackup -n /tmp/server-backup

The verification process will fail. The exact error may vary, but the most
common version looks like this:
pg_verifybackup: error: checksum mismatch for file "pg_stat/junk" in
archive "base.tar.lz4"
pg_verifybackup: error: "base/4/3431" is present in the manifest but not on
disk
pg_verifybackup: error: "base/5/2620" is present in the manifest but not on
disk
pg_verifybackup: error: "base/1/2617" is present in the manifest but not on
disk
pg_verifybackup: error: "base/4/3576" is present in the manifest but not on
disk
......

Using gzip or zstd compressions in the same setup doesn’t lead to any
problems. Moreover, the lz4 archive itself is fine and it can be
successfully unpacked via cli lz4 unpacker. The problem thus is in reading,
and some investigation done by me and my colleagues lead us to
astreamer_lz4.c file.
Looks like astreamer_lz4_decompressor_content() function is initializing
_out-parameters incorrectly:
next_out = (uint8 *) mystreamer->base.bbs_buffer.data;
avail_out = mystreamer->base.bbs_buffer.maxlen;

, while should be:
next_out = (uint8 *) mystreamer->base.bbs_buffer.data +
mystreamer->bytes_written;
avail_out = mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written;

And this would be consistent with what the gzip version of the function
(astreamer_gzip_decompressor_content()) does.

In the vast majority of cases the mystreamer->bytes_written value is zero
at the moment of next_out/avail_out initialization, thus the problem does
not appear. In fact we were totally unable to reproduce non-zero initial
value of mystreamer->bytes_written without poor-compressible custom format
files.I’m not that deep in the compression algorithms to say exactly why is
that, and I realize that very few users will really face the problem, but
the problem is definitely there.

Attached is a simple patch implementing the above fix. The issue disappears
with it.
What do you think?

--
best regards,
Mikhail A. Gribkov

Attachment Content-Type Size
v1-Fix-lz4-decompressor.patch application/octet-stream 651 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2025-06-25 21:44:22 Re: Making Row Comparison NULL row member handling more robust during skip scans
Previous Message Jelte Fennema-Nio 2025-06-25 19:59:23 Re: Huge commitfest app update upcoming: Tags, Draft CF, Help page, and automated commitfest creat/open/close