Re: astreamer fixes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: astreamer fixes
Date: 2026-03-28 15:15:41
Message-ID: 1479929.1774710941@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> After the recent rash of fixes to the astreamer code, I thought it might
> be a good idea to take a closer look for more issues. The attached
> proposes six fixes. Full disclosure: I found two of these (those in
> astreamer_tar_parser_free() and astreamer_extractor_content() ), and
> claude found the rest. I believe all those it found are indeed things
> that should be fixed (and backpatched). The wrong data pointer issue is
> one I suspect it would have been quite hard to find.

Bleah ...

Your changes in astreamer_file.c and astreamer_tar.c are clearly
correct fixes. I fear that astreamer_gzip.c is still several bricks
shy of a load though: looks like Claude noticed some issues all right,
but its fixes are wrong/incomplete. Reading the documentation in
/usr/include/zlib.h, I notice:

1. I do not think it's possible to get Z_STREAM_END from inflate()
in astreamer_gzip_decompressor_content, because we don't tell it
that we've reached end-of-stream. So the proposed changes there
are incorrect. We should indeed switch to whitelisting not
blacklisting result codes, but not expect Z_STREAM_END.

2. What Claude noticed I think, but failed to correct accurately, is
that astreamer_gzip_decompressor_finalize needs to invoke inflate()
with Z_FINISH, and pump it until it returns Z_STREAM_END. The
code as it stands is probably failing to produce the last few bytes
of decompressed output in many cases. We've not noticed because that
just results in truncating the undefined post-tar-trailer junk.

3. It sure looks to me like astreamer_lz4_decompressor_finalize and
astreamer_zstd_decompressor_finalize have related bugs. There's no
provision in them for flushing any buffered data out of those
libraries, either.

Maybe we should instrument astreamer_tar_parser_finalize to report
exactly how much trailer data it got, in order to check whether this
apparent bug is real. If the result is different between an
uncompressed tar file and the same file compressed, then it's
broken.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2026-03-28 15:32:23 Re: Patch: dumping tables data in multiple chunks in pg_dump
Previous Message Xuneng Zhou 2026-03-28 15:14:58 Re: RFC: pg_stat_logmsg