Re: Potential deadlock in pgaio_io_wait()

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Potential deadlock in pgaio_io_wait()
Date: 2025-08-15 05:39:30
Message-ID: CA+hUKGJhZLeNE8oDJKQC6n=9g44=54qf18sSF8zj6kfTBf-9Ow@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

I discussed this off-list with Andres who provided the following review:

* +1 for the analysis
* +1 for the solution
* his benchmark machine shows no regression under heavy IO submission workload
* needs better comments

I had expected that patch to be rejected as too slow. I was thinking
that it should be enough to insert a memory barrier and then do an
unlocked check for an empty waitlist in ConditionVariableBroadcast()
to avoid a spinlock acquire/release. That caused a few weird hangs I
didn't understand, which is why I didn't post that version last time,
but he pointed out that the other side also needs a memory barrier in
ConditionVariablePrepareToSleep() and the existing spinlock
acquire/release is not enough. Still processing that, but given that
ConditionVariableBroadcast() performance is already sufficient, there
is no need for fancy tricks for this problem and my "naive" patch is
actually fine.

So here's a version that just adds some comments and corrects a minor thinko.

I also thought of a small optimisation, presented in the -B patch.
It's a bit of a shame to wait for backend->submit_cv and then also
ioh->cv in io_method=worker. It's just a bunch of IPC ping-pong for
nothing. So I figured it should be allowed to fall all the way
through based on its lack of ->wait_one. Worth bothering with?

On a superficial note:

AIO_IO_COMPLETION "Waiting for another process to complete IO."
+AIO_IO_SUBMIT "Waiting for another process to submit IO."
AIO_IO_URING_SUBMIT "Waiting for IO submission via io_uring."
AIO_IO_URING_EXECUTION "Waiting for IO execution via io_uring."

We're inconsistent in our choice of noun or verb. I went with _SUBMIT
to match the following line, rather than _SUBMISSION to match the
preceding line. Shrug.

Attachment Content-Type Size
v2-0001-aio-Fix-pgaio_io_wait-for-staged-IOs.patch application/octet-stream 5.1 KB
v2-0001-aio-Fix-pgaio_io_wait-for-staged-IOs-B.patch application/octet-stream 6.3 KB

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Amit Kapila 2025-08-15 06:21:47 Re: BUG #18988: DROP SUBSCRIPTION locks not-yet-accessed database
Previous Message Dilip Kumar 2025-08-14 11:34:02 Re: BUG #18988: DROP SUBSCRIPTION locks not-yet-accessed database

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2025-08-15 05:45:57 RE: Conflict detection for update_deleted in logical replication
Previous Message Zhijie Hou (Fujitsu) 2025-08-15 05:25:50 RE: memory leak in logical WAL sender with pgoutput's cachectx