Re: Deadlock between logrep apply worker and tablesync worker

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Deadlock between logrep apply worker and tablesync worker
Date: 2023-01-23 06:08:51
Message-ID: CAA4eK1KqcjfwOo-UDc33c-qvjyGKqpijwydWOfX7seLNAi9L1w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 23, 2023 at 1:29 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Another thing that has a bad smell about it is the fact that
> process_syncing_tables_for_sync uses two transactions in the first
> place. There's a comment there claiming that it's for crash safety,
> but I can't help suspecting it's really because this case becomes a
> hard deadlock without that mid-function commit.
>
> It's not great in any case that the apply worker can move on in
> the belief that the tablesync worker is done when in fact the latter
> still has catalog state updates to make. And I wonder what we're
> doing with having both of them calling replorigin_drop_by_name
> ... shouldn't that responsibility belong to just one of them?
>

Originally, it was being dropped at one place only (via tablesync
worker) but we found a race condition as mentioned in the comments in
process_syncing_tables_for_sync() before the start of the second
transaction which leads to this change. See the report and discussion
about that race condition in the email [1].

[1] - https://www.postgresql.org/message-id/CAD21AoAw0Oofi4kiDpJBOwpYyBBBkJj=sLUOn4Gd2GjUAKG-fw@mail.gmail.com

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2023-01-23 06:21:34 Re: Wasted Vacuum cycles when OldestXmin is not moving
Previous Message Bharath Rupireddy 2023-01-23 05:59:51 Re: Improve GetConfigOptionValues function