BUG #19373: One backend hanging in AioIoUringExecution blocking other backends

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: michael(dot)kroell(at)gmail(dot)com
Subject: BUG #19373: One backend hanging in AioIoUringExecution blocking other backends
Date: 2026-01-09 10:14:28
Message-ID: 19373-aac0a0ee0aac6a8b@postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 19373
Logged by: Michael Kröll
Email address: michael(dot)kroell(at)gmail(dot)com
PostgreSQL version: 18.1
Operating system: Linux 6.1.0-41-amd64 #1 SMP PREEMPT_DYNAMIC Debian
Description:

We've upgraded to Pg18 with ``io_method=io_uring`` early last December and
things were running smoothly until early last Sunday one of the simple
SELECT queries which is triggered a couple of thousands a day and usually
only runs for milliseconds got stuck. It was hanging for almost 24h without
visible activity until I've manually killed the backend (with -9 force).

The query looked like this in the backend:
| pid | leader_pid | state_change | wait_event_type |
wait_event | state |
|---------|------------|-------------------------------|-----------------|---------------------|--------|
| 2034811 | | 2026-01-04 07:18:27.158077+01 | IO |
AioIoUringExecution | active |
| 3497711 | 2034811 | 2026-01-04 07:18:27.182794+01 | IPC |
MessageQueueSend | active |
| 3497712 | 2034811 | 2026-01-04 07:18:27.184025+01 | IPC |
MessageQueueSend | active |

and the leader PID looked like waiting

```bash
~ # strace -p 2034811
strace: Process 2034811 attached
io_uring_enter(20, 0, 1, IORING_ENTER_GETEVENTS, NULL, 8

[root(at)host] 2026-01-05 07:58:11
~ # ltrace -p 2034811
io_uring_wait_cqes(0x7f3af3ea9e10, 0x7fff2bb25e00, 1, 0
```

Even though there was a *global* statement_timeout=61s configured, backends
accessing the same table were hanging with ``LWLock AioUringCompletion``

Restarting the cluster did not go through until the hanging leader PID was
``SIGKILL``ed

Nothing in journal, Pg log or ring-buffer hinting to something around the
time-frame of problematic's backend query_start.

Did anyone experience similar issues?

Is that a kernel/io_uring issue or something which Pg should/could handle?

```
Pg 18.1 (Debian 18.1-1.pgdg12+2)
Linux 6.1.0-41-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.158-1 (2025-11-09)
x86_64 GNU/Linux
```

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Nathan Bossart 2026-01-09 16:19:35 Re: BUG #19365: postgres 18 pg_dump fails whan drop sequence concurrently
Previous Message Dilip Kumar 2026-01-09 04:46:04 Re: BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18