Re: Add errdetail() with PID and UID about source of termination signal

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add errdetail() with PID and UID about source of termination signal
Date: 2026-04-16 05:50:25
Message-ID: CAKZiRmwXSn4VGHpYXNsOe1KgArDnMXg3pc1CZWjZWwd=_njCxA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 15, 2026 at 6:23 PM Jacob Champion
<jacob(dot)champion(at)enterprisedb(dot)com> wrote:
>
> On Wed, Apr 15, 2026 at 7:17 AM Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> > OK, pushed. Thanks.
>
> I hit the following in the pg_basebackup tests just now, running on Linux:
>
> [08:41:21.621](0.377s) ok 196 - Walsender killed
> [09:09:11.134](1669.513s) # pump_until: timeout expired when
> searching for "(?^:background process terminated unexpectedly)" with
> stream: "pg_basebackup: error: unexpected termination of replication
> stream: FATAL: terminating connection due to administrator command
> # DETAIL: Signal sent by PID 155573, UID 1000.
> # "
> [09:09:11.134](0.000s) not ok 197 - background process exit message
> [09:09:11.134](0.000s) # Failed test 'background process exit message'
> # at src/postgres/src/bin/pg_basebackup/t/010_pg_basebackup.pl line 1049.
>
> But I haven't been able to reproduce since, so I don't know if this is
> a new race, or the commit just exposed one that was there before?

Hi Jacob, the time baseback took seems strange to me (27mins?!). It was
properly killed by a timeout, and the new code added the exact PID
that caused the issue.

If you happen to spot it again long running it might make some sense
to find where the time is spent there during that basebackup (in this
test we shouldn't be taking large backups).

Alternative would be to check pg server logs of that specific failed run
to see exactly where it was stuck after 08:41 (but before 09:09).

-J.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Soumya S Murali 2026-04-16 05:56:09 Re: Fix bug with accessing to temporary tables of other sessions
Previous Message Hayato Kuroda (Fujitsu) 2026-04-16 05:48:15 RE: ECPG: inconsistent behavior with the document in “GET/SET DESCRIPTOR.”