From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Alexander Lakhin <exclusion(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Non-reproducible AIO failure |
Date: | 2025-05-27 13:06:51 |
Message-ID: | CA+hUKGKh1XP=0NHQ5pb=4cY4r8wpaw2nAORiLQZYLKva6hO+FQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, May 26, 2025 at 12:05 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> > Could you guys please share your exact repro steps?
>
> I've just been running 027_stream_regress.pl over and over.
> It's not a recommendable answer though because the failure
> probability is tiny, under 1%. It sounded like Alexander
> had a better way.
Could you please share your configure options?
While flailing around in the dark and contemplating sources of
nondeterminism that might come from a small system under a lot of load
(as hinted at by Alexander's mention of running the test in parallel)
with a 1MB buffer pool (as used by 027_stream_read.pl via Cluster.pm's
settings for replication tests), I thought about partial reads:
--- a/src/backend/storage/aio/aio_io.c
+++ b/src/backend/storage/aio/aio_io.c
@@ -128,6 +128,8 @@ pgaio_io_perform_synchronously(PgAioHandle *ioh)
result = pg_preadv(ioh->op_data.read.fd, iov,
ioh->op_data.read.iov_length,
ioh->op_data.read.offset);
+ if (result > BLCKSZ && rand() < RAND_MAX / 2)
+ result = BLCKSZ;
... and the fallback path for io_method=worker that runs IOs
synchronous when the submission queue overflows because the I/O
workers aren't keeping up:
--- a/src/backend/storage/aio/method_worker.c
+++ b/src/backend/storage/aio/method_worker.c
@@ -253,7 +253,7 @@ pgaio_worker_submit_internal(int nios, PgAioHandle *ios[])
for (int i = 0; i < nios; ++i)
{
Assert(!pgaio_worker_needs_synchronous_execution(ios[i]));
- if (!pgaio_worker_submission_queue_insert(ios[i]))
+ if (rand() < RAND_MAX / 2 ||
!pgaio_worker_submission_queue_insert(ios[i]))
{
/*
* We'll do it synchronously, but only after
we've sent as many as
... but still no dice here...
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2025-05-27 13:26:03 | Re: Non-reproducible AIO failure |
Previous Message | Zaid Shabbir | 2025-05-27 13:01:39 | All supported PostgreSQL 17 extensions list |