RE: pg_logical_slot_get_changes waits continously for a partial WAL record spanning across 2 pages

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'vignesh C' <vignesh21(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Subject: RE: pg_logical_slot_get_changes waits continously for a partial WAL record spanning across 2 pages
Date: 2025-06-30 12:11:30
Message-ID: OSCPR01MB1496661B939CC826215F370D0F546A@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Vignesh,

> I was unable to reproduce the same test failure on the PG17 branch,
> even after running the test around 500 times. However, on the master
> branch, the failure consistently reproduces approximately once in
> every 50 runs. I also noticed that while the buildfarm has reported
> multiple failures for this test for the master branch, none of them
> appear to be on the PG17 branch. I'm not yet sure why this discrepancy
> exists.

I was also not able to reproduce as-is. After analyzing bit more, I found on
PG17, the workload cannot generate an FPI_FOR_HINT. The type of WAL record
has longer length than the page there was a possibility that the WAL record
could be flushed partially in HEAD. But in PG17 it could not happen so that
OVERWRITE_CONTRECORD won't be appeared.

I modified the test code like [1] and confirmed that the same stuck could happen
on PG17. It generates a long record which can go across the page and can be
flushed partially.

[1]:
```
--- a/src/test/recovery/t/046_checkpoint_logical_slot.pl
+++ b/src/test/recovery/t/046_checkpoint_logical_slot.pl
@@ -123,6 +123,10 @@ $node->safe_psql('postgres',
$node->safe_psql('postgres',
q{select injection_points_wakeup('checkpoint-before-old-wal-removal')});

+# Generate a long WAL record
+$node->safe_psql('postgres',
+ q{select pg_logical_emit_message(false, '', repeat('123456789', 1000))});
```

Best regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2025-06-30 12:21:51 Re: pg_logical_slot_get_changes waits continously for a partial WAL record spanning across 2 pages
Previous Message Ashutosh Bapat 2025-06-30 12:08:44 Re: Report replica identity in pg_publication_tables