Re: Logrep launcher race conditions leading to slow tests

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Logrep launcher race conditions leading to slow tests
Date: 2025-06-25 10:32:41
Message-ID: CAExHW5vj6HbjN65xBN+O=TxafidivA8pZJs2AXQUsshq=-jZSQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 24, 2025 at 9:53 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> BTW, it strikes me that if we're going to leave
> process_syncing_tables_for_apply() ignoring the result of
> logicalrep_worker_launch, it'd be smart to insert an explicit
> (void) cast to show that that's intentional. Otherwise Coverity
> is likely to complain about how we're ignoring the result in
> one place and not the other.

+1.

On Tue, Jun 24, 2025 at 10:03 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> writes:
> > On Tue, Jun 24, 2025 at 5:26 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> 4. In process_syncing_tables_for_apply (the other caller of
> >> logicalrep_worker_launch), it seems okay to ignore the
> >> result of logicalrep_worker_launch, but I think it should
> >> fill hentry->last_start_time before not after the call.
> >> Otherwise we might be changing a hashtable entry that's
> >> no longer relevant to this worker.
>
> > A hash entry is associated with a table, not the worker. In case the
> > worker fails to launch it records the time when worker launch for that
> > table was attempted so that next attempt could be well-spaced in time.
> > I am not able your last statement, what is the entry's relevance to
> > the worker.
>
> > But your change makes this code similar to ApplyLauncherMain(), which
> > deals with subscriptions. +1 for the consistency.
>
> Yeah, mainly I want to make it look more like ApplyLauncherMain().
> It's true right now that nothing outside this process will touch that
> hash table, so it doesn't matter which way we do it. But if we were
> to switch that table to being shared state, this'd be an unsafe order
> of operations for the same reasons it'd be wrong to do it like that in
> ApplyLauncherMain().

Makes sense to fix it proactively.

--
Best Wishes,
Ashutosh Bapat

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nazir Bilal Yavuz 2025-06-25 10:45:59 Re: [PATCH] Fix OAuth feature detection on OpenBSD+Meson
Previous Message Fujii Masao 2025-06-25 10:02:29 Re: pg_dump misses comments on NOT NULL constraints