From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Potential deadlock in pgaio_io_wait() |
Date: | 2025-09-22 14:53:28 |
Message-ID: | ad74d2ee-f7d7-4423-baf1-b3d2f8846bf6@iki.fi |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
On 19/08/2025 03:07, Andres Freund wrote:
> On 2025-08-15 17:39:30 +1200, Thomas Munro wrote:
>> From ec2e1e21054f00918d3b28ce01129bc06de37790 Mon Sep 17 00:00:00 2001
>> From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
>> Date: Sun, 3 Aug 2025 23:07:56 +1200
>> Subject: [PATCH v2 1/2] aio: Fix pgaio_io_wait() for staged IOs.
>>
>> Previously, pgaio_io_wait()'s cases for PGAIO_HS_DEFINED and
>> PGAIO_HS_STAGED fell through to waiting for completion. The owner only
>> promises to advance it to PGAIO_HS_SUBMITTED. The waiter needs to be
>> prepared to call ->wait_one() itself once the IO is submitted in order
>> to guarantee progress and avoid deadlocks on IO methods that provide
>> ->wait_one().
>>
>> Introduce a new per-backend condition variable submit_cv, woken by by
>> pgaio_submit_stage(), and use it to wait for the state to advance. The
>> new broadcast doesn't seem to cause any measurable slowdown, so ideas
>> for optimizing the common no-waiters case were abandoned for now.
>>
>> It may not be possible to reach any real deadlock with existing AIO
>> users, but that situation could change. There's also no reason the
>> waiter shouldn't begin to wait via the IO method as soon as possible
>> even without a deadlock.
>>
>> Picked up by testing a proposed IO method that has ->wait_one(), like
>> io_method=io_uring, and code review.
>
> LGTM.
I just independently noticed this same issue, wrote a little test to
reproduce it, and was about to report it, when I noticed that you found
this already. Attached is the repro script.
Both of the proposed patches seem fine to me. I'm inclined to go with
the first patch (v2-0001-aio-Fix-pgaio_io_wait-for-staged-IOs.patch),
without the extra optimization, unless we can actually measure a
performance difference.
- Heikki
Attachment | Content-Type | Size |
---|---|---|
0001-Repro-backend-stuck-waiting-on-submitted-IO-with-io_.patch | text/x-patch | 3.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Aaron Ackerman | 2025-09-22 19:39:38 | Numeric Type Precision Not Respected in Function or Procedure Arguments |
Previous Message | Franz Philipp Moser | 2025-09-22 12:06:43 | Maybe problems with autovaccum? |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-09-22 14:54:07 | Re: Having postgresql.org link to cgit instead of gitweb |
Previous Message | Maxime Schoemans | 2025-09-22 14:38:53 | Re: [PATCH] Check that index can return in get_actual_variable_range() |