Re: PostgreSQL 16.6 , query stuck with STAT Ssl, wait_event_type : IPC , wait_event : ParallelFinish

From: Achilleas Mantzios <a(dot)mantzios(at)cloud(dot)gatewaynet(dot)com>
To: pgsql-admin <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: Re: PostgreSQL 16.6 , query stuck with STAT Ssl, wait_event_type : IPC , wait_event : ParallelFinish
Date: 2025-05-31 21:12:50
Message-ID: c78f4a69-eeb6-49b2-8b95-d232b457d6fc@cloud.gatewaynet.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On 31/5/25 23:43, Achilleas Mantzios wrote:

> Hi
>
> a query is stuck with the above, it seems it waits for parallel worker
> to finish, however , there are no parallel works running :
>
> postgres(at)[local]/dynacom=# SELECT application_name, backend_type,
> backend_start,xact_start,query_start,wait_event_type, wait_event
> ,state FROM pg_stat_activity;
> application_name |         backend_type         |
>         backend_start         |          xact_start           |
>          query_start          | wait_event_type |     wait_event
>      | state
> ------------------+------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+---------------------+--------
>
>                  | autovacuum launcher          | 2024-11-29
> 17:48:50.92935+02  |                               |
>                               | Activity        | AutoVacuumMain      |
>                  | logical replication launcher | 2024-11-29
> 17:48:50.929496+02 |                               |
>                               | Activity        | LogicalLauncherMain |
> DBMIRROR         | client backend               | 2025-05-31
> 19:04:16.724305+03 | 2025-05-31 19:05:21.686093+03 | 2025-05-31
> 19:05:21.909936+03 | IPC             | ParallelFinish      | active
>                  | client backend               | 2025-05-31
> 23:31:30.030806+03 |                               | 2025-05-31
> 23:35:05.045573+03 | Client          | ClientRead          | idle
> psql             | client backend               | 2025-05-31
> 23:29:33.863485+03 | 2025-05-31 23:35:09.322972+03 | 2025-05-31
> 23:35:09.322972+03 |                 |                     | active
> RXMLFVSLS        | client backend               | 2025-05-31
> 23:32:37.351131+03 |                               | 2025-05-31
> 23:35:09.295221+03 | Client          | ClientRead          | idle
> psql             | client backend               | 2025-04-28
> 16:59:55.968442+03 |                               | 2025-05-27
> 16:43:56.338228+03 | Client          | ClientRead          | idle
>                  | background writer            | 2024-11-29
> 17:48:50.916876+02 |                               |
>                               | Activity        | BgWriterMain        |
>                  | archiver                     | 2024-12-03
> 18:57:36.447067+02 |                               |
>                               | Activity        | ArchiverMain        |
>                  | checkpointer                 | 2024-11-29
> 17:48:50.916648+02 |                               |
>                               | Activity        | CheckpointerMain    |
>                  | walwriter                    | 2024-11-29
> 17:48:50.928789+02 |                               |
>                               | Activity        | WalWriterMain       |
> (11 rows)
>
> postgres(at)[local]/dynacom=#
>
> So, I will terminate this backend now to get the system working again,
> we are curious why this happened, our system serves daily 22M+
> transactions, this is Saturday night hence the low traffic.
>
> postgres(at)smadb:~$ lsb_release -a
> No LSB modules are available.
> Distributor ID: Debian
> Description:    Debian GNU/Linux 12 (bookworm)
> Release:        12
> Codename:       bookworm
> postgres(at)smadb:~$ psql -Aqt -c 'select version()'
> PostgreSQL 16.6 on x86_64-pc-linux-gnu, compiled by gcc (Debian
> 12.2.0-14) 12.2.0, 64-bit
> postgres(at)smadb:~$
>
>
Some additional info :

1690535 : the pid in question

we found no trace or indication for OOM killer ,

root(at)smadb:~# strace -p 1690535
strace: Process 1690535 attached

epoll_wait(12,

also lsof showed :

postgres  1690535 1690649 Sweeper           postgres    0r      CHR
               1,3        0t0          4 /dev/null
postgres  1690535 1690649 Sweeper           postgres    1w     FIFO
              0,14        0t0  213624178 pipe
postgres  1690535 1690649 Sweeper           postgres    2w     FIFO
              0,14        0t0  213624178 pipe
postgres  1690535 1690649 Sweeper           postgres    3u  a_inode
              0,15          0       1059 [signalfd]
postgres  1690535 1690649 Sweeper           postgres    4r     FIFO
              0,14        0t0  213624177 pipe
postgres  1690535 1690649 Sweeper           postgres    5u  a_inode
              0,15          0       1059 [eventpoll:3,4,11]
postgres  1690535 1690649 Sweeper           postgres    6u      REG
              8,32       8192  157352475 /raid4/pgsql/data/PG_16_202
307071/207491653/2601
postgres  1690535 1690649 Sweeper           postgres    7u      REG
              8,32     450560  157352631 /raid4/pgsql/data/PG_16_202
307071/207491653/207493206
postgres  1690535 1690649 Sweeper           postgres    8u      REG
              8,32      40960  157356847 /raid4/pgsql/data/PG_16_202
307071/207491653/207503536
postgres  1690535 1690649 Sweeper           postgres    9u      REG
              8,32      40960  157356848 /raid4/pgsql/data/PG_16_202
307071/207491653/207503538
postgres  1690535 1690649 Sweeper           postgres   10u      REG
              8,32      40960  157357848 /raid4/pgsql/data/PG_16_202
307071/207491653/207504627
postgres  1690535 1690649 Sweeper           postgres   11u     IPv4
        1241927029        0t0        TCP smadb.internal.net:postgres
ql->sma.internal.net:42615 (ESTABLISHED)

postgres  1690535 1690649 Sweeper           postgres   12u  a_inode
              0,15          0       1059 [eventpoll:3,4]

So , as far as we understand, it waited for an inode ?

I tried to pg_terminate_backend(1690535); it did nothing to the process,
then pg_cancel_backend(1690535);

I went to the shell and did a normal

kill 1690535

it did nothing, and before i kill -9 it, I just tried kill -HUP, and it
woke up with

10.9.0.10(42615) [1690535] 683b2880.19cba7 2025-05-31 23:53:48.231 EEST
DBMIRROR postgres(at)dynacom line:4 FATAL:  terminating connection due to
administrator command
10.9.0.10(42615) [1690535] 683b2880.19cba7 2025-05-31 23:53:48.231 EEST
DBMIRROR postgres(at)dynacom line:5 STATEMENT:  SELECT pd.XID,MAX(SeqId)
FROM dbmirror_Pending
pd LEFT JOIN dbmirror_MirroredTransaction mt INNER JOIN
dbmirror_MirrorHost mh ON mt.MirrorHostId =  mh.MirrorHostId AND
mh.HostName= '192.168.211.1'  ON pd.XID =
mt.XID WHERE mt.XID is null and (pd.slaveid is null or pd.slaveid =
'579')  GROUP BY pd.XID  ORDER BY MAX(pd.SeqId)
10.9.0.10(42615) [1690535] 683b2880.19cba7 2025-05-31 23:53:48.234 EEST
DBMIRROR postgres(at)dynacom line:6 LOG:  disconnection: session time:
4:49:31.510 user=postgr
es database=dynacom host=10.9.0.10 port=42615

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2025-06-01 04:19:58 Re: PostgreSQL 16.6 , query stuck with STAT Ssl, wait_event_type : IPC , wait_event : ParallelFinish
Previous Message Achilleas Mantzios 2025-05-31 20:43:52 PostgreSQL 16.6 , query stuck with STAT Ssl, wait_event_type : IPC , wait_event : ParallelFinish