Re: 035_standby_logical_decoding unbounded hang

From: Noah Misch <noah(at)leadboat(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: 035_standby_logical_decoding unbounded hang
Date: 2024-02-15 20:48:16
Message-ID: 20240215204816.cb.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 14, 2024 at 03:31:16PM +0000, Bertrand Drouvot wrote:
> On Sat, Feb 10, 2024 at 05:02:27PM -0800, Noah Misch wrote:
> > The 035_standby_logical_decoding.pl hang is
> > a race condition arising from an event sequence like this:
> >
> > - Test script sends CREATE SUBSCRIPTION to subscriber, which loses the CPU.
> > - Test script calls pg_log_standby_snapshot() on primary. Emits XLOG_RUNNING_XACTS.
> > - checkpoint_timeout makes a primary checkpoint finish. Emits XLOG_RUNNING_XACTS.
> > - bgwriter executes LOG_SNAPSHOT_INTERVAL_MS logic. Emits XLOG_RUNNING_XACTS.
> > - CREATE SUBSCRIPTION wakes up and sends CREATE_REPLICATION_SLOT to standby.
> >
> > Other test code already has a solution for this, so the attached patches add a
> > timeout and copy the existing solution. I'm also attaching the hack that
> > makes it 100% reproducible.

> I did a few tests and confirm that the proposed solution fixes the corner case.

Thanks for reviewing.

> What about creating a sub, say wait_for_restart_lsn_calculation() in Cluster.pm
> and then make use of it in create_logical_slot_on_standby() and above? (something
> like wait_for_restart_lsn_calculation-v1.patch attached).

Waiting for restart_lsn is just a prerequisite for calling
pg_log_standby_snapshot(), so I wouldn't separate those two. If we're
extracting a sub, I would move the pg_log_standby_snapshot() call into the sub
and make the API like one of these:

$standby->wait_for_subscription_starting_point($primary, $slot_name);
$primary->log_standby_snapshot($standby, $slot_name);

Would you like to finish the patch in such a way?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2024-02-15 21:17:41 Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions
Previous Message Peter Geoghegan 2024-02-15 20:30:06 Re: index prefetching