From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Cc: | Andres Freund <andres(at)anarazel(dot)de> |
Subject: | Re: Potential deadlock in pgaio_io_wait() |
Date: | 2025-08-15 05:39:30 |
Message-ID: | CA+hUKGJhZLeNE8oDJKQC6n=9g44=54qf18sSF8zj6kfTBf-9Ow@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
I discussed this off-list with Andres who provided the following review:
* +1 for the analysis
* +1 for the solution
* his benchmark machine shows no regression under heavy IO submission workload
* needs better comments
I had expected that patch to be rejected as too slow. I was thinking
that it should be enough to insert a memory barrier and then do an
unlocked check for an empty waitlist in ConditionVariableBroadcast()
to avoid a spinlock acquire/release. That caused a few weird hangs I
didn't understand, which is why I didn't post that version last time,
but he pointed out that the other side also needs a memory barrier in
ConditionVariablePrepareToSleep() and the existing spinlock
acquire/release is not enough. Still processing that, but given that
ConditionVariableBroadcast() performance is already sufficient, there
is no need for fancy tricks for this problem and my "naive" patch is
actually fine.
So here's a version that just adds some comments and corrects a minor thinko.
I also thought of a small optimisation, presented in the -B patch.
It's a bit of a shame to wait for backend->submit_cv and then also
ioh->cv in io_method=worker. It's just a bunch of IPC ping-pong for
nothing. So I figured it should be allowed to fall all the way
through based on its lack of ->wait_one. Worth bothering with?
On a superficial note:
AIO_IO_COMPLETION "Waiting for another process to complete IO."
+AIO_IO_SUBMIT "Waiting for another process to submit IO."
AIO_IO_URING_SUBMIT "Waiting for IO submission via io_uring."
AIO_IO_URING_EXECUTION "Waiting for IO execution via io_uring."
We're inconsistent in our choice of noun or verb. I went with _SUBMIT
to match the following line, rather than _SUBMISSION to match the
preceding line. Shrug.
Attachment | Content-Type | Size |
---|---|---|
v2-0001-aio-Fix-pgaio_io_wait-for-staged-IOs.patch | application/octet-stream | 5.1 KB |
v2-0001-aio-Fix-pgaio_io_wait-for-staged-IOs-B.patch | application/octet-stream | 6.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2025-08-15 06:21:47 | Re: BUG #18988: DROP SUBSCRIPTION locks not-yet-accessed database |
Previous Message | Dilip Kumar | 2025-08-14 11:34:02 | Re: BUG #18988: DROP SUBSCRIPTION locks not-yet-accessed database |
From | Date | Subject | |
---|---|---|---|
Next Message | Zhijie Hou (Fujitsu) | 2025-08-15 05:45:57 | RE: Conflict detection for update_deleted in logical replication |
Previous Message | Zhijie Hou (Fujitsu) | 2025-08-15 05:25:50 | RE: memory leak in logical WAL sender with pgoutput's cachectx |