Re: Perform streaming logical transactions by background workers and parallel apply

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Date: 2023-05-08 06:33:48
Message-ID: CAA4eK1JjRRwdT8=d86mrp-vX2YwWNXv5TUid7sb4fe-EmUXDWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 8, 2023 at 11:08 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, May 8, 2023 at 12:52 PM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Monday, May 8, 2023 11:08 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> >
> > Hi,
> >
> > >
> > > On Tue, May 2, 2023 at 12:22 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > > wrote:
> > > >
> > > > On Fri, Apr 28, 2023 at 11:48 AM Masahiko Sawada
> > > <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > >
> > > > > While investigating this issue, I've reviewed the code around
> > > > > callbacks and worker termination etc and I found a problem.
> > > > >
> > > > > A parallel apply worker calls the before_shmem_exit callbacks in the
> > > > > following order:
> > > > >
> > > > > 1. ShutdownPostgres()
> > > > > 2. logicalrep_worker_onexit()
> > > > > 3. pa_shutdown()
> > > > >
> > > > > Since the worker is detached during logicalrep_worker_onexit(),
> > > > > MyLogicalReplication->leader_pid is an invalid when we call
> > > > > pa_shutdown():
> > > > >
> > > > > static void
> > > > > pa_shutdown(int code, Datum arg)
> > > > > {
> > > > > Assert(MyLogicalRepWorker->leader_pid != InvalidPid);
> > > > > SendProcSignal(MyLogicalRepWorker->leader_pid,
> > > > > PROCSIG_PARALLEL_APPLY_MESSAGE,
> > > > > InvalidBackendId);
> > > > >
> > > > > Also, if the parallel apply worker fails shm_toc_lookup() during the
> > > > > initialization, it raises an error (because of noError = false) but
> > > > > ends up a SEGV as MyLogicalRepWorker is still NULL.
> > > > >
> > > > > I think that we should not use MyLogicalRepWorker->leader_pid in
> > > > > pa_shutdown() but instead store the leader's pid to a static variable
> > > > > before registering pa_shutdown() callback.
> > > > >
> > > >
> > > > Why not simply move the registration of pa_shutdown() to someplace
> > > > after logicalrep_worker_attach()?
> > >
> > > If we do that, the worker won't call dsm_detach() if it raises an
> > > ERROR in logicalrep_worker_attach(), is that okay? It seems that it's
> > > no practically problem since we call dsm_backend_shutdown() in
> > > shmem_exit(), but if so why do we need to call it in pa_shutdown()?
> >
> > I think the dsm_detach in pa_shutdown was intended to fire on_dsm_detach
> > callbacks to give callback a chance to report stat before the stat system is
> > shutdown, following what we do in ParallelWorkerShutdown() (e.g.
> > sharedfileset.c callbacks cause fd.c to do ReportTemporaryFileUsage(), so we
> > need to fire that earlier).
> >
> > But for parallel apply, we currently only have one on_dsm_detach
> > callback(shm_mq_detach_callback) which doesn't report extra stats. So the
> > dsm_detach in pa_shutdown is only used to make it a bit future-proof in case
> > we add some other on_dsm_detach callbacks in the future which need to report
> > stats.
>
> Make sense . Given that it's possible that we add other callbacks that
> report stats in the future, I think it's better not to move the
> position to register pa_shutdown() callback.
>

Hmm, what kind of stats do we expect to be collected before we
register pa_shutdown? I think if required we can register such a
callback after pa_shutdown. I feel without reordering the callbacks,
the fix would be a bit complicated as explained in my previous email,
so I don't think it is worth complicating this code unless really
required.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-05-08 07:10:09 Re: SQL:2011 application time
Previous Message Masahiko Sawada 2023-05-08 05:38:19 Re: Perform streaming logical transactions by background workers and parallel apply