| From: | jie wang <jugierwang(at)gmail(dot)com> |
|---|---|
| To: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
| Cc: | Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Add errdetail() with PID and UID about source of termination signal |
| Date: | 2026-04-07 03:55:15 |
| Message-ID: | CAJnZyeDui4YFseMKUc93bMHyY8J5cpFXRoVqNPsZQ_kWhLcVjw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Andrew Dunstan <andrew(at)dunslane(dot)net> 于2026年4月7日周二 00:51写道:
>
> On 2026-02-26 Th 4:25 AM, Jakub Wartak wrote:
> > On Thu, Feb 26, 2026 at 4:09 AM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
> >
> > Hi Chao,
> >
> >> I just reviewed v4 again and got a few more comments:
> >>
> >> 1. This patch only set the global proc_die_sender_pid/uid to 0 at
> startup, then assign values to them upon receiving SIGTERM, and never reset
> them, which assumes a process must die upon SIGTERM. Is the assumption
> true? I guess not. If a process receives SIGTERM and not die immediately,
> then die for other reason, then it may report a misleading PID and UID.
> > Hmm, I'm not sure I follow. If we receive SIGTERM and not die immediately
> > (for whatever reason), then two scenarios can happen as far as I'm
> concerned:
> > * another SIGTERM comes in from the same or different uid/pid and it wll
> be
> > reported properly
> > * different SIGKILL, but in this case we won't report UID/PID at all
> >
> > am I missing something or do You have any particular scenario in mind?
> > The flow will be wrapper_handler()->die()->SetLatch()->..->directyl to
> > err reporting facilities.
> >
> >> 2.
> >> syncrpe.c uses errhint to print PID and UID, and postgres.c uses
> errdetail. We should keep consistency, maybe all use errhint.
> > Right, let's make it that way.
> >
> >> 3.
> >> ```
> >> @@ -319,7 +323,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
> >> QueryCancelPending = false;
> >> ereport(WARNING,
> >> (errmsg("canceling wait for
> synchronous replication due to user request"),
> >> - errdetail("The transaction has
> already committed locally, but might not have been replicated to the
> standby.")));
> >> + errdetail("The transaction has
> already committed locally, but might not have been replicated to the
> standby."),
> >> + proc_die_sender_pid ==
> 0 ? 0 :
> >> + errhint("Signal
> sent by PID %d, UID %d.",
> >> +
> proc_die_sender_pid, proc_die_sender_uid)
> >> + ));
> >> SyncRepCancelWait();
> >> break;
> >> }
> >> ```
> >>
> >> I don’t think the query cancel case relates to SIGTERM, so we don’t
> need to log PID and UID here.
> > Right, it was superfluous.
> >
> > v5 attached.
> >
>
>
> I'd kinda like to sneak this in for pg19, because I think it's useful.
> Here's a v6 that changes one or two things:
>
>
> - changes the globals to sig_atomic_t
>
> - in ProcessInterrupts, copies to local sender_pid/sender_uid, then
> zeros the globals before any ereport
>
> - uses errdetail() for all the messages
>
>
> Plus a few more cosmetic changes like consistent casing.
>
>
> cheers
>
>
> andrew
>
>
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
I just reviewed v6 and got 1 comment.
sig_atomic_t is underlying int, but pid_t and uid_t are usually unsigned
int,
so that while saving pid and uid to ProcDieSenderPid and ProcDieSenderUid,
overflow may happen. But that’s fine, as the data is stored in the same way
and we only want to print them. So, for the print statement:
#define ERRDETAIL_SIGNAL_SENDER(pid, uid) \
((pid) == 0 ? 0 : \
errdetail("Signal sent by PID %d, UID %d.", (int) (pid), (int) (uid)))
Does it make sense to use %u and cast to pid_t and uid_t? Like:
#define ERRDETAIL_SIGNAL_SENDER(pid, uid) \
((pid) == 0 ? 0 : \
errdetail("Signal sent by PID %u, UID %u.", (pid_t) (pid), (uid_t)
(uid)))
Thanks!
--
wang jie
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Chao Li | 2026-04-07 03:58:08 | Re: Add missing stats_reset column to pg_stat_database_conflicts view |
| Previous Message | Fujii Masao | 2026-04-07 03:54:41 | Re: pgsql: Reduce log level of some logical decoding messages from LOG to D |