Re: Potential deadlock in pgaio_io_wait()

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Potential deadlock in pgaio_io_wait()
Date: 2025-09-22 14:53:28
Message-ID: ad74d2ee-f7d7-4423-baf1-b3d2f8846bf6@iki.fi
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On 19/08/2025 03:07, Andres Freund wrote:
> On 2025-08-15 17:39:30 +1200, Thomas Munro wrote:
>> From ec2e1e21054f00918d3b28ce01129bc06de37790 Mon Sep 17 00:00:00 2001
>> From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
>> Date: Sun, 3 Aug 2025 23:07:56 +1200
>> Subject: [PATCH v2 1/2] aio: Fix pgaio_io_wait() for staged IOs.
>>
>> Previously, pgaio_io_wait()'s cases for PGAIO_HS_DEFINED and
>> PGAIO_HS_STAGED fell through to waiting for completion. The owner only
>> promises to advance it to PGAIO_HS_SUBMITTED. The waiter needs to be
>> prepared to call ->wait_one() itself once the IO is submitted in order
>> to guarantee progress and avoid deadlocks on IO methods that provide
>> ->wait_one().
>>
>> Introduce a new per-backend condition variable submit_cv, woken by by
>> pgaio_submit_stage(), and use it to wait for the state to advance. The
>> new broadcast doesn't seem to cause any measurable slowdown, so ideas
>> for optimizing the common no-waiters case were abandoned for now.
>>
>> It may not be possible to reach any real deadlock with existing AIO
>> users, but that situation could change. There's also no reason the
>> waiter shouldn't begin to wait via the IO method as soon as possible
>> even without a deadlock.
>>
>> Picked up by testing a proposed IO method that has ->wait_one(), like
>> io_method=io_uring, and code review.
>
> LGTM.

I just independently noticed this same issue, wrote a little test to
reproduce it, and was about to report it, when I noticed that you found
this already. Attached is the repro script.

Both of the proposed patches seem fine to me. I'm inclined to go with
the first patch (v2-0001-aio-Fix-pgaio_io_wait-for-staged-IOs.patch),
without the extra optimization, unless we can actually measure a
performance difference.

- Heikki

Attachment Content-Type Size
0001-Repro-backend-stuck-waiting-on-submitted-IO-with-io_.patch text/x-patch 3.6 KB

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Aaron Ackerman 2025-09-22 19:39:38 Numeric Type Precision Not Respected in Function or Procedure Arguments
Previous Message Franz Philipp Moser 2025-09-22 12:06:43 Maybe problems with autovaccum?

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-09-22 14:54:07 Re: Having postgresql.org link to cgit instead of gitweb
Previous Message Maxime Schoemans 2025-09-22 14:38:53 Re: [PATCH] Check that index can return in get_actual_variable_range()