Re: Single transaction in the tablesync worker?

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: Re: Single transaction in the tablesync worker?
Date: 2020-12-23 10:09:00
Message-ID: CAA4eK1K4J1jZYLR_k_Cq-dMS7o7ke_yy3VP4-UK05yfOX3o-+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 22, 2020 at 4:58 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Mon, Dec 21, 2020 at 11:36 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Mon, Dec 21, 2020 at 3:17 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > >
> > > On Mon, Dec 21, 2020 at 4:23 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > > Few other comments:
> > > > ==================
> > >
> > > Thanks for your feedback.
> > >
> > > > 1.
> > > > * FIXME 3 - Crashed tablesync workers may also have remaining slots
> > > > because I don't think
> > > > + * such workers are even iterated by this loop, and nobody else is
> > > > removing them.
> > > > + */
> > > > + if (slotname)
> > > > + {
> > > >
> > > > The above FIXME is not clear to me. Actually, the crashed workers
> > > > should restart, finish their work, and drop the slots. So not sure
> > > > what exactly this FIXME refers to?
> > >
> > > Yes, normally if the tablesync can complete it should behave like that.
> > > But I think there are other scenarios where it may be unable to
> > > clean-up after itself. For example:
> > >
> > > i) Maybe the crashed tablesync worker cannot finish. e.g. A row insert
> > > handled by tablesync can give a PK violation which also will crash
> > > again and again for each re-launched/replacement tablesync worker.
> > > This can be reproduced in the debugger. If the DropSubscription
> > > doesn't clean-up the tablesync's slot then nobody will.
> > >
> > > ii) Also DROP SUBSCRIPTION code has locking (see code commit) "to
> > > ensure that the launcher doesn't restart new worker during dropping
> > > the subscription".
> > >
> >
> > Yeah, I have also read that comment but do you know how it is
> > preventing relaunch? How does the subscription lock help?
>
> Hmmm. I did see there is a matching lock in get_subscription_list of
> launcher.c, which may be what that code comment was referring to. But
> I also am currently unsure how this lock prevents anybody (e.g.
> process_syncing_tables_for_apply) from executing another
> logicalrep_worker_launch.
>

process_syncing_tables_for_apply will be called by the apply worker
and we are stopping the apply worker. So, after that launcher won't
start a new apply worker because of get_subscription_list() and if the
apply worker is not started then it won't be able to start tablesync
worker. So, we need the handling of crashed tablesync workers here
such that we need to drop any new sync slots.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2020-12-23 10:28:54 Re: Deadlock between backend and recovery may not be detected
Previous Message Amit Kapila 2020-12-23 09:45:36 Re: Single transaction in the tablesync worker?