Re: Single transaction in the tablesync worker?

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Single transaction in the tablesync worker?
Date: 2021-01-14 06:40:47
Message-ID: CAA4eK1JS2x6CKzduvZNJM1v6=tEnYrQdBLQGEzpeX94eod9pAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 13, 2021 at 5:07 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Jan 12, 2021 at 6:17 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > On Mon, Jan 11, 2021 at 3:32 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > The workers for removed tables are now immediately stopped (like
> > DropSubscription does). Although I did include the AccessExclusiveLock
> > as (a) suggested, AFAIK this was actually ineffective at preventing
> > the workers relaunching.
> >
>
> The reason why it was ineffective is that you are locking
> SubscriptionRelationId which is to protect relaunch of apply workers
> not tablesync workers. But in current form even acquiring
> SubscriptionRelRelationId lock won't serve the purpose because
> process_syncing_tables_for_apply() doesn't always acquire it before
> relaunching the tablesync workers. However, if we acquire
> SubscriptionRelRelationId in process_syncing_tables_for_apply() then
> it would prevent relaunch of workers but not sure if that is a good
> idea. Can you think of some other way?
>
> > Instead, I am using
> > logicalrep_worker_stop_at_commit to do this - testing shows it as
> > working ok. Please see the code and latest test logs [v14] for
> > details.
> >
>
> There is still a window where it can relaunch. Basically, after you
> stop the worker in AlterSubscription_refresh and till the commit
> happens apply worker can relaunch the tablesync workers. I don't see
> code-wise how we can protect that. And if the tablesync workers are
> restarted after we stopped them, the purpose won't be achieved because
> it can recreate or try to reuse the slot which we have dropped.
>
> The other issue with the current code could be that after we drop the
> slot and origin what if the transaction (in which we are doing Alter
> Subscription) is rolledback? Basically, the workers will be relaunched
> and it would assume that slot should be there but the slot won't be
> present. I have thought of dropping the slot at commit time after we
> stop the workers but again not sure if that is a good idea because at
> that point we don't want to establish the connection with the
> publisher.
>
> I think this needs some more thought.
>

I have another idea to solve this problem. Instead of Alter
Subscription drop the slot/origin, we can let tablesync worker do it.
Basically, we need to register SignalHandlerForShutdownRequest as
SIGTERM handler and then later need to check ShutdownRequestPending
flag in the tablesync worker. If the flag is set, then we can drop the
slot/origin and allow the process to exit cleanly.

This will obviate the need to take the lock and all sort of rollback
problems. If this works out well then I think we can use this for
DropSubscription as well but that is a matter of separate patch.

Thoughts?

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2021-01-14 06:43:25 Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion
Previous Message Pavel Stehule 2021-01-14 06:35:29 Re: proposal: schema variables