Re: Single transaction in the tablesync worker?

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Single transaction in the tablesync worker?
Date: 2021-01-08 02:50:43
Message-ID: CAA4eK1+ayKaOk_qZ3CCq9xHaHj7TP-mngygqYdKGAZ5E2dcnmQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 8, 2021 at 7:14 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Thu, Jan 7, 2021 at 3:20 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Jan 6, 2021 at 3:39 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Jan 6, 2021 at 2:13 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > > >
> > > > I think it makes sense. If there can be a race between the tablesync
> > > > re-launching (after error), and the AlterSubscription_refresh removing
> > > > some table’s relid from the subscription then there could be lurking
> > > > slot/origin tablesync resources (of the removed table) which a
> > > > subsequent DROP SUBSCRIPTION cannot discover. I will think more about
> > > > how/if it is possible to make this happen. Anyway, I suppose I ought
> > > > to refactor/isolate some of the tablesync cleanup code in case it
> > > > needs to be commonly called from DropSubscription and/or from
> > > > AlterSubscription_refresh.
> > > >
> > >
> > > Fair enough.
> > >
> >
> > I think before implementing, we should once try to reproduce this
> > case. I understand this is a timing issue and can be reproduced only
> > with the help of debugger but we should do that.
>
> FYI, I was able to reproduce this case in debugger. PSA logs showing details.
>

Thanks for reproducing as I was worried about exactly this case. I
have one question related to logs:

##
## ALTER SUBSCRIPTION to REFRESH the publication

## This blocks on some latch until the tablesync worker dies, then it continues
##

Did you check which exact latch or lock blocks this? It is important
to retain this interlock as otherwise even if decide to drop slot (and
or origin) the tablesync worker might continue.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-01-08 02:57:24 Re: [PATCH] Simple progress reporting for COPY command
Previous Message Justin Pryzby 2021-01-08 02:35:37 Re: PoC/WIP: Extended statistics on expressions