Re: Perform streaming logical transactions by background workers and parallel apply

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Date: 2023-04-28 06:18:01
Message-ID: CAD21AoDo+yUwNq6nTrvE2h9bB2vZfcag=jxWc7QxuWCmkDAqcA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 28, 2023 at 11:51 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Apr 26, 2023 at 4:11 PM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Wednesday, April 26, 2023 5:00 PM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
> > >
> > > IIUC, that assert will fail in case of any error raised between
> > > ApplyWorkerMain()->logicalrep_worker_attach()->before_shmem_exit() and
> > > ApplyWorkerMain()->InitializeApplyWorker()->BackgroundWorkerInitializeC
> > > onnectionByOid()->InitPostgres().
> >
> > Thanks for reporting the issue.
> >
> > I think the problem is that it tried to release locks in
> > logicalrep_worker_onexit() before the initialization of the process is complete
> > because this callback function was registered before the init phase. So I think we
> > can add a conditional statement before releasing locks. Please find an attached
> > patch.
> >
>
> Alexander, does the proposed patch fix the problem you are facing?
> Sawada-San, and others, do you see any better way to fix it than what
> has been proposed?

I'm concerned that the idea of relying on IsNormalProcessingMode()
might not be robust since if we change the meaning of
IsNormalProcessingMode() some day it would silently break again. So I
prefer using something like InitializingApplyWorker, or another idea
would be to do cleanup work (e.g., fileset deletion and lock release)
in a separate callback that is registered after connecting to the
database.

While investigating this issue, I've reviewed the code around
callbacks and worker termination etc and I found a problem.

A parallel apply worker calls the before_shmem_exit callbacks in the
following order:

1. ShutdownPostgres()
2. logicalrep_worker_onexit()
3. pa_shutdown()

Since the worker is detached during logicalrep_worker_onexit(),
MyLogicalReplication->leader_pid is an invalid when we call
pa_shutdown():

static void
pa_shutdown(int code, Datum arg)
{
Assert(MyLogicalRepWorker->leader_pid != InvalidPid);
SendProcSignal(MyLogicalRepWorker->leader_pid,
PROCSIG_PARALLEL_APPLY_MESSAGE,
InvalidBackendId);

Also, if the parallel apply worker fails shm_toc_lookup() during the
initialization, it raises an error (because of noError = false) but
ends up a SEGV as MyLogicalRepWorker is still NULL.

I think that we should not use MyLogicalRepWorker->leader_pid in
pa_shutdown() but instead store the leader's pid to a static variable
before registering pa_shutdown() callback. And probably we can
remember the backend id of the leader apply worker to speed up
SendProcSignal().

FWIW, we might need to be careful about the timing when we call
logicalrep_worker_detach() in the worker's termination process. Since
we rely on IsLogicalParallelApplyWorker() for the parallel apply
worker to send ERROR messages to the leader apply worker, if an ERROR
happens after logicalrep_worker_detach(), we will end up with the
assertion failure.

if (IsLogicalParallelApplyWorker())
SendProcSignal(pq_mq_parallel_leader_pid,
PROCSIG_PARALLEL_APPLY_MESSAGE,
pq_mq_parallel_leader_backend_id);
else
{
Assert(IsParallelWorker());

It normally would be a should-no-happen case, though.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2023-04-28 06:44:04 Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode
Previous Message Pavel Stehule 2023-04-28 05:00:10 Re: proposal: psql: show current user in prompt