From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Daniel Gustafsson <daniel(at)yesql(dot)se> |
Cc: | Evgeniy Gorbanev <gorbanyoves(at)basealt(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: No error checking when reading from file using zstd in pg_dump |
Date: | 2025-06-16 17:56:45 |
Message-ID: | 644360.1750096605@sss.pgh.pa.us |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Daniel Gustafsson <daniel(at)yesql(dot)se> writes:
> On 16 Jun 2025, at 16:20, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> This API is actually quite bizarrely defined, if you ask me.
>> Returning the byte count is optional, but if you don't pay
>> attention to the byte count you cannot know if you got any
>> data. At best, that's bug-encouraging.
> Agreed. Given that many of implementations in the gzip support code directly
> return the gzXXX function I suspect it was modeled around GZip but thats an
> unresearched guess. The fact that this returns Z_NULL where the API defines
> NULL is unlikely to ever be an issue, but it also doesn't seem entirely neat..
After looking around a bit more I realized that this API is a complete
disaster: it's not only bug-prone, but there are actual bugs in
multiple callers, eg _ReadBuf() totally fails to detect early EOF as
it intends to. We need to fix it, not slap some band-aids on.
Draft patch attached.
The write_func side is a bit of a mess too. I fixed some obvious API
violations in the attached, but I think we need to do more, because
it seems that zero thought was given to reporting errors sanely.
The callers all assume that strerror() is what to use to report an
incomplete write, but this is not appropriate in every case, for
instance we need to pay attention to gzerror() in gzip cases.
I'm inclined to think that the best answer is to move error reporting
into the write_funcs, and just make the API spec be "write or die".
I've not tackled that here though.
I was depressed to find that Zstd_getc reads one byte into
an otherwise-uninitialized int and then returns the whole int.
Even if we're lucky enough for the variable to start off zero,
this would fail utterly on big-endian machines. The only
reason we've not noticed is that the regression tests do not
reach Zstd_getc, nor most of the other getc_func implementations.
Now I'm afraid to look at the rest of the compress_io.h API.
If it's as badly written as these parts, there are more bugs
to be found.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
v1-fix-broken-read_func-API.patch | text/x-diff | 7.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Yan Haibo | 2025-06-16 18:03:20 | 回复: Fix potential overflow risks from wcscpy and sprintf |
Previous Message | Tom Lane | 2025-06-16 17:36:20 | Re: Per-role disabling of LEAKPROOF requirements for row-level security? |