Re: subscriptionCheck failures

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: subscriptionCheck failures
Date: 2021-03-16 07:15:15
Message-ID: CALDaNm1y3XXn7mAq+VRqOaM4PY7d2o310OnwRfF9zEXTJ4Z28g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 16, 2021 at 12:29 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Mar 16, 2021 at 9:00 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Mon, Mar 15, 2021 at 6:00 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> > >
> > > Hi,
> > >
> > > This seems to be a new low frequency failure, I didn't see it mentioned already:
> > >
> >
> > Thanks for reporting, I'll look into it.
> >
>
> By looking at the logs [1] in the buildfarm, I think I know what is
> going on here. After Create Subscription, the tablesync worker is
> launched and tries to create the slot for doing the initial copy but
> before it could finish creating the slot, we issued the Drop
> Subscription which first stops the tablesync worker and then tried to
> drop its slot. Now, it is quite possible that by the time Drop
> Subscription tries to drop the tablesync slot, it is not yet created.
> We treat this condition okay and just Logs the message. I don't think
> this is an issue because anyway generally such a slot created on the
> server will be dropped before we persist it but the test was checking
> the existence of slots on server before it gets dropped. I think we
> can avoid such a situation by preventing cancel/die interrupts while
> creating tablesync slot.
>
> This is a timing issue, so I have reproduced it via debugger and
> tested that the attached patch fixes it.
>

Thanks for the patch.
I was able to reproduce the issue using debugger by making it wait at
CreateReplicationSlot. After applying the patch the issue gets solved.

Few minor comments:
1) subscrition should be subscription in the below change:
+ * Prevent cancel/die interrupts while creating slot here because it is
+ * possible that before the server finishes this command a concurrent drop
+ * subscrition happens which would complete without removing this slot
+ * leading to a dangling slot on the server.
*/

2) "finishes this command a concurrent drop" should be "finishes this
command, a concurrent drop" in the below change:
+ * Prevent cancel/die interrupts while creating slot here because it is
+ * possible that before the server finishes this command a concurrent drop
+ * subscrition happens which would complete without removing this slot
+ * leading to a dangling slot on the server.
*/

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-03-16 07:20:37 Permission failures with WAL files in 13~ on Windows
Previous Message Amit Kapila 2021-03-16 06:59:34 Re: subscriptionCheck failures