| From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
|---|---|
| To: | Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
| Cc: | Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Add errdetail() with PID and UID about source of termination signal |
| Date: | 2026-04-06 16:51:40 |
| Message-ID: | 1ba07c75-cc4e-410c-9af1-076feeeb9594@dunslane.net |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 2026-02-26 Th 4:25 AM, Jakub Wartak wrote:
> On Thu, Feb 26, 2026 at 4:09 AM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>
> Hi Chao,
>
>> I just reviewed v4 again and got a few more comments:
>>
>> 1. This patch only set the global proc_die_sender_pid/uid to 0 at startup, then assign values to them upon receiving SIGTERM, and never reset them, which assumes a process must die upon SIGTERM. Is the assumption true? I guess not. If a process receives SIGTERM and not die immediately, then die for other reason, then it may report a misleading PID and UID.
> Hmm, I'm not sure I follow. If we receive SIGTERM and not die immediately
> (for whatever reason), then two scenarios can happen as far as I'm concerned:
> * another SIGTERM comes in from the same or different uid/pid and it wll be
> reported properly
> * different SIGKILL, but in this case we won't report UID/PID at all
>
> am I missing something or do You have any particular scenario in mind?
> The flow will be wrapper_handler()->die()->SetLatch()->..->directyl to
> err reporting facilities.
>
>> 2.
>> syncrpe.c uses errhint to print PID and UID, and postgres.c uses errdetail. We should keep consistency, maybe all use errhint.
> Right, let's make it that way.
>
>> 3.
>> ```
>> @@ -319,7 +323,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
>> QueryCancelPending = false;
>> ereport(WARNING,
>> (errmsg("canceling wait for synchronous replication due to user request"),
>> - errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
>> + errdetail("The transaction has already committed locally, but might not have been replicated to the standby."),
>> + proc_die_sender_pid == 0 ? 0 :
>> + errhint("Signal sent by PID %d, UID %d.",
>> + proc_die_sender_pid, proc_die_sender_uid)
>> + ));
>> SyncRepCancelWait();
>> break;
>> }
>> ```
>>
>> I don’t think the query cancel case relates to SIGTERM, so we don’t need to log PID and UID here.
> Right, it was superfluous.
>
> v5 attached.
>
I'd kinda like to sneak this in for pg19, because I think it's useful.
Here's a v6 that changes one or two things:
- changes the globals to sig_atomic_t
- in ProcessInterrupts, copies to local sender_pid/sender_uid, then
zeros the globals before any ereport
- uses errdetail() for all the messages
Plus a few more cosmetic changes like consistent casing.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
| Attachment | Content-Type | Size |
|---|---|---|
| v6-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patch | text/x-patch | 12.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Alexander Lakhin | 2026-04-06 17:00:00 | Re: Changing the state of data checksums in a running cluster |
| Previous Message | Melanie Plageman | 2026-04-06 16:50:58 | Re: EXPLAIN: showing ReadStream / prefetch stats |