Re: BUG #18155: Logical Apply Worker Timeout During TableSync Causes Either Stuckness or Data Loss

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Callahan, Drew" <callaan(at)amazon(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #18155: Logical Apply Worker Timeout During TableSync Causes Either Stuckness or Data Loss
Date: 2023-10-18 01:22:14
Message-ID: ZS8zRgKfB7AcxJWv@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Oct 17, 2023 at 09:50:52AM +0530, Amit Kapila wrote:
> On Tue, Oct 17, 2023 at 4:46 AM Callahan, Drew <callaan(at)amazon(dot)com> wrote:
>> On the server side, we did not see evidence of WALSenders being launched. As a result, the gap kept increasing further
>> and further since they workers would not transition to the catchup state after several hours due to this.
>
> One possibility is that the system has reached
> 'max_logical_replication_workers' limit due to which it is not
> allowing to launch the apply worker. If so, then consider increasing
> the value of 'max_logical_replication_workers'. You can query
> 'pg_stat_subscription' to know more information about workers. See the
> description of subscriber-side parameters [1].

Hmm. So you basically mean that not being able to launch new workers
prevents the existing workers to move on with their individual sync,
freeing slots once their sync is done for other tables. Then, this
causes all all of the existing workers to remain in a syncwait state,
further increasing the gap in WAL replay. Am I getting that right?
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message torikoshia 2023-10-18 06:50:39 Re: pg_rewind WAL segments deletion pitfall
Previous Message Tom Lane 2023-10-17 21:11:16 Re: pg_dump needs SELECT privileges on irrelevant extension table