Re: Single transaction in the tablesync worker?

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Single transaction in the tablesync worker?
Date: 2021-02-01 07:38:16
Message-ID: CAHut+PtoVRFJRKFzesaiBDRj7VFXRQERVK9Af_vwRF9TRHzSDQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 1, 2021 at 5:19 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> > > AFAIK there is always a potential race with DropSubscription dropping
> > > > slots. The DropSubscription might be running at exactly the same time
> > > > the apply worker has just dropped the very same tablesync slot.
> > > >
> > >
> > > We stopped the workers before getting a list of NotReady relations and
> > > then we try to drop the corresponding slots. So, how such a race
> > > condition can happen? Note, because we have a lock on pg_subscrition,
> > > there is no chance that the workers can restart till the transaction
> > > end.
> >
> > OK. I think I was forgetting the logicalrep_worker_stop would also go
> > into a loop waiting for the worker process to die. So even if the
> > tablesync worker does simultaneously drop it's own slot, I think it
> > will certainly at least be in SYNCDONE state before DropSubscription
> > does anything else with that worker.
> >
>
> How is that ensured? We don't have anything like HOLD_INTERRUPTS
> between the time dropped the slot and updated rel state as SYNCDONE.
> So, isn't it possible that after we dropped the slot and before we
> update the state, the SIGTERM signal arrives and led to worker exit?
>

The worker has the SIGTERM handler of "die". IIUC the "die" function
doesn't normally do anything except set some flags to say please die
at the next convenient opportunity. My understanding is that the
worker process will not actually exit until it next executes
CHECK_FOR_INTERRUPTS(), whereupon it will see the ProcDiePending flag
and *really* die. So even if the SIGTERM signal arrives immediately
after the slot is dropped, the tablesync will still become SYNCDONE.
Is this wrong understanding?

But your scenario could still be possible if "die" exited immediately
(e.g. only in single user mode?).

----
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-02-01 07:39:48 Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Previous Message Bharath Rupireddy 2021-02-01 07:13:35 Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit