| From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
|---|---|
| To: | Álvaro Herrera <alvherre(at)kurilemu(dot)de> |
| Cc: | Maxim Orlov <orlovmg(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Rework SLRU I/O errors handle |
| Date: | 2026-03-06 16:38:26 |
| Message-ID: | aa7908db-b88e-4bb6-a2bc-5b3cee9a28cb@iki.fi |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 06/03/2026 18:22, Heikki Linnakangas wrote:
> On 06/03/2026 17:33, Álvaro Herrera wrote:
>> I'm not a fan of the split. I think it all these patches should be
>> pushed as a single commit, and avoid introducing xact_errmsg_for_io_error
>> as an exposed function. I think that doesn't make a lot of sense. Each
>> SLRU should have a correct and appropriate error reporting callback.
>
> Agreed.
>
>> The comment added in 0005 is bogus too. It mentions
>> InvalidTransactionId,
>> but the problem is not that the value is 0 but rather that we get no
>> pointer. Also, in all other callbacks the pointer is asserted to not be
>> NULL, so why don't we do the same here, and avoid an error message
>> that's not going to help anyone? I see however that in the patch we're
>> passing a NULL to SlruReportIOError(), which means if you get an IO
>> error with any SLRU other than xact, you're going to get either a crash
>> on the assertion, or (on non-debug builds) a crash on dereferencing the
>> NULL pointer.
>
> Hmm, I thought we could just never pass a NULL pointer, but if a Write
> fails, slru.c has no context where to pull that opaque_data. Perhaps we
> should just not call the callback in that case.
>
> I'm starting to feel that what SlruReportIOError() currently uses as the
> errdetail, could well be the main error message, and the callback could
> provide the errdetail. I.e. swap the errmsg and errdetail we have now.
> That way, we can just leave out the errdetail for Write failures. The
> current errmsg we have for Write failures is pretty bad: "ERROR: could
> not access status of transaction 0". What's currently the errdetail,
> e.g. "Could not read from file \"%s\" at offset %d: %m", is a lot more
> informative.
Attached version that does that.
As another data point, we have a lot of error messages like "could not
open file \"%s\": %m" in the source tree. slru.c was the only place
where that was in the errdetail() part.
- Heikki
| Attachment | Content-Type | Size |
|---|---|---|
| v2-0001-Add-a-callback-for-I-O-error-messages-in-SLRUs.patch | text/x-patch | 32.5 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andrew Dunstan | 2026-03-06 16:41:03 | Re: Allow specifying NULL default in pg_proc.dat for "any" arguments |
| Previous Message | Heikki Linnakangas | 2026-03-06 16:22:42 | Re: Rework SLRU I/O errors handle |