| From: | Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com> |
|---|---|
| To: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
| Cc: | Andrew Dunstan <andrew(at)dunslane(dot)net>, Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Add errdetail() with PID and UID about source of termination signal |
| Date: | 2026-02-26 09:25:47 |
| Message-ID: | CAKZiRmxvdYXhO1JrVLW3LkuoaTjYpKmhOzzOCOhw4wDM3rYRMQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Thu, Feb 26, 2026 at 4:09 AM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
Hi Chao,
> I just reviewed v4 again and got a few more comments:
>
> 1. This patch only set the global proc_die_sender_pid/uid to 0 at startup, then assign values to them upon receiving SIGTERM, and never reset them, which assumes a process must die upon SIGTERM. Is the assumption true? I guess not. If a process receives SIGTERM and not die immediately, then die for other reason, then it may report a misleading PID and UID.
Hmm, I'm not sure I follow. If we receive SIGTERM and not die immediately
(for whatever reason), then two scenarios can happen as far as I'm concerned:
* another SIGTERM comes in from the same or different uid/pid and it wll be
reported properly
* different SIGKILL, but in this case we won't report UID/PID at all
am I missing something or do You have any particular scenario in mind?
The flow will be wrapper_handler()->die()->SetLatch()->..->directyl to
err reporting facilities.
> 2.
> syncrpe.c uses errhint to print PID and UID, and postgres.c uses errdetail. We should keep consistency, maybe all use errhint.
Right, let's make it that way.
> 3.
> ```
> @@ -319,7 +323,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
> QueryCancelPending = false;
> ereport(WARNING,
> (errmsg("canceling wait for synchronous replication due to user request"),
> - errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
> + errdetail("The transaction has already committed locally, but might not have been replicated to the standby."),
> + proc_die_sender_pid == 0 ? 0 :
> + errhint("Signal sent by PID %d, UID %d.",
> + proc_die_sender_pid, proc_die_sender_uid)
> + ));
> SyncRepCancelWait();
> break;
> }
> ```
>
> I don’t think the query cancel case relates to SIGTERM, so we don’t need to log PID and UID here.
Right, it was superfluous.
v5 attached.
-J.
| Attachment | Content-Type | Size |
|---|---|---|
| v5-0001-Add-errdetail-with-PID-and-UID-about-source-of-te.patch | text/x-patch | 12.5 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Alexander Kukushkin | 2026-02-26 09:29:29 | Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication |
| Previous Message | Chao Li | 2026-02-26 09:21:13 | Re: guc: make dereference style consistent in check_backtrace_functions |