Re: pg_logical_slot_get_changes waits continously for a partial WAL record spanning across 2 pages

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Subject: Re: pg_logical_slot_get_changes waits continously for a partial WAL record spanning across 2 pages
Date: 2025-06-26 11:55:42
Message-ID: CALDaNm3Nbf5gDi0dqFt69U1a4fvfBptZuz4CVmFzDk4C4XDT_Q@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 26 Jun 2025 at 06:22, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Wed, Jun 25, 2025 at 10:19:55PM +0530, vignesh C wrote:
> > Currently, the logic attempts to read the complete WAL record based on
> > the size obtained before the crash—even though only a partial record
> > was written. It then checks the page header to determine whether the
> > XLP_FIRST_IS_OVERWRITE_CONTRECORD flag is set only after reading the
> > complete WAL record at XLogDecodeNextRecord function, but since that
> > much WAL data was not available in the system we never get a chance to
> > check the header after this.. To address this issue, a more robust
> > approach would be to first read the page header, check if the
> > XLP_FIRST_IS_OVERWRITE_CONTRECORD flag is set, and only then proceed
> > to read the WAL record size if the record is not marked as a partial
> > overwrite. This would prevent the system from waiting for WAL data
> > that will never arrive. Attached partial_wal_record_fix.patch patch
> > for this.
>
> So you are suggesting the addition of an extra ReadPageInternal() that
> forces a read of only the read, perform the checks on the header, then
> read the rest. After reading SizeOfXLogShortPHD worth of data,
> shouldn't the checks on xlp_rem_len be done a bit earlier than what
> you are proposing in this patch?

Modified

> > I don't have a consistent test to reproduce this issue, currently this
> > issue can be reproduced by running the 046_checkpoint_logical_slot
> > test about 50 times. This test 046_checkpoint_logical_slot was
> > reverted recently after it caused a few buildfarm failures discussed
> > at [2]. The same test is also attached as
> > reverted_test_046_checkpoint_logical_slot.patch.
>
> Seeing the noise that this originally created in the CFbot and the
> buildfarm, even if the issue would be only triggered after a timeout,
> I'd vote for relying on this test being good enough for the purpose of
> this race condition.

I agree

Another reliable approach would be to make the
> code wait before reading the record in the internal loop of
> ReadPageInternal() with an injection point when we know that we have a
> contrecord, but I'm not really excited about this prospect in
> xlogreader.c which can be also used in the frontend.

I could not find an easy way to simulate this scenario through
injection point, I felt 046_checkpoint_logical_slot.pl should be
enough.

The attached v2 version patch has the changes for the same.

Regards,
Vignesh

Attachment Content-Type Size
v2-0001-Fix-infinite-wait-when-reading-partially-written-.patch text/x-patch 7.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Álvaro Herrera 2025-06-26 12:06:34 Re: Remove HeapTupleheaderSetXmin{Committed,Invalid} functions
Previous Message Jakub Wartak 2025-06-26 11:36:47 Re: NUMA shared memory interleaving