Re: Skip partition tuple routing with constant partition key

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)postgresql(dot)org>, Amit Langote <amitlangote09(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Zhihong Yu <zyu(at)yugabyte(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: Skip partition tuple routing with constant partition key
Date: 2022-04-13 07:18:59
Message-ID: CAD21AoCkiVnesVyQcUggWtuJqGhn1GCFQ6_s6=c1BU+arK8Zjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 13, 2022 at 2:39 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Apr 12, 2022 at 6:16 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> > On Thu, Apr 7, 2022 at 4:37 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > >
> > > Hi,
> > >
> > > On 2022-04-06 00:07:07 -0400, Tom Lane wrote:
> > > > Amit Langote <amitlangote09(at)gmail(dot)com> writes:
> > > > > On Sun, Apr 3, 2022 at 10:31 PM Greg Stark <stark(at)mit(dot)edu> wrote:
> > > > >> Is this a problem with the patch or its tests?
> > > > >> [18:14:20.798] Test Summary Report
> > > > >> [18:14:20.798] -------------------
> > > > >> [18:14:20.798] t/013_partition.pl (Wstat: 15360 Tests: 31 Failed: 0)
> > > >
> > > > > Hmm, make check-world passes for me after rebasing the patch (v10) to
> > > > > the latest HEAD (clean), nor do I see a failure on cfbot:
> > > > > http://cfbot.cputube.org/amit-langote.html
> > > >
> > > > 013_partition.pl has been failing regularly in the buildfarm,
> > > > most recently here:
> > > >
> > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2022-03-31%2000%3A49%3A45
> > >
> > > Just failed locally on my machine as well.
> > >
> > >
> > > > I don't think there's room to blame any uncommitted patches
> > > > for that. Somebody broke it a short time before here:
> > > >
> > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-03-17%2016%3A08%3A19
> > >
> > > The obvious thing to point a finger at is
> > >
> > > commit c91f71b9dc91ef95e1d50d6d782f477258374fc6
> > > Author: Tomas Vondra <tomas(dot)vondra(at)postgresql(dot)org>
> > > Date: 2022-03-16 16:42:47 +0100
> > >
> > > Fix publish_as_relid with multiple publications
> > >
> >
> > I've not managed to reproduce this issue on my machine but while
> > reviewing the code and the server logs[1] I may have found possible
> > bugs:
> >
> > 2022-04-08 12:59:30.701 EDT [91997:1] LOG: logical replication apply
> > worker for subscription "sub2" has started
> > 2022-04-08 12:59:30.702 EDT [91998:3] 013_partition.pl LOG:
> > statement: ALTER SUBSCRIPTION sub2 SET PUBLICATION pub_lower_level,
> > pub_all
> > 2022-04-08 12:59:30.733 EDT [91998:4] 013_partition.pl LOG:
> > disconnection: session time: 0:00:00.036 user=buildfarm
> > database=postgres host=[local]
> > 2022-04-08 12:59:30.740 EDT [92001:1] LOG: logical replication table
> > synchronization worker for subscription "sub2", table "tab4_1" has
> > started
> > 2022-04-08 12:59:30.744 EDT [91997:2] LOG: logical replication apply
> > worker for subscription "sub2" will restart because of a parameter
> > change
> > 2022-04-08 12:59:30.750 EDT [92003:1] LOG: logical replication table
> > synchronization worker for subscription "sub2", table "tab3" has
> > started
> >
> > The logs say that the apply worker for "sub2" finished whereas the
> > tablesync workers for "tab4_1" and "tab3" started. After these logs,
> > there are no logs that these tablesync workers finished and the apply
> > worker for "sub2" restarted, until the timeout. While reviewing the
> > code, I realized that the tablesync workers can advance its relstate
> > even without the apply worker intervention.
> >
> > After a tablesync worker copies the table it sets
> > SUBREL_STATE_SYNCWAIT to its relstate, then it waits for the apply
> > worker to update the relstate to SUBREL_STATE_CATCHUP. If the apply
> > worker has already died, it breaks from the wait loop and returns
> > false:
> >
> > wait_for_worker_state_change():
> >
> > for (;;)
> > {
> > LogicalRepWorker *worker;
> >
> > :
> >
> > /*
> > * Bail out if the apply worker has died, else signal it we're
> > * waiting.
> > */
> > LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
> > worker = logicalrep_worker_find(MyLogicalRepWorker->subid,
> > InvalidOid, false);
> > if (worker && worker->proc)
> > logicalrep_worker_wakeup_ptr(worker);
> > LWLockRelease(LogicalRepWorkerLock);
> > if (!worker)
> > break;
> >
> > :
> > }
> >
> > return false;
> >
> > However, the caller doesn't check the return value at all:
> >
> > /*
> > * We are done with the initial data synchronization, update the state.
> > */
> > SpinLockAcquire(&MyLogicalRepWorker->relmutex);
> > MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCWAIT;
> > MyLogicalRepWorker->relstate_lsn = *origin_startpos;
> > SpinLockRelease(&MyLogicalRepWorker->relmutex);
> >
> > /*
> > * Finally, wait until the main apply worker tells us to catch up and then
> > * return to let LogicalRepApplyLoop do it.
> > */
> > wait_for_worker_state_change(SUBREL_STATE_CATCHUP);
> > return slotname;
> >
> > Therefore, the tablesync worker started logical replication while
> > keeping its relstate as SUBREL_STATE_SYNCWAIT.
> >
> > Given the server logs, it's likely that both tablesync workers for
> > "tab4_1" and "tab3" were in this situation. That is, there were two
> > tablesync workers who were applying changes for the target relation
> > but the relstate was SUBREL_STATE_SYNCWAIT.
> >
> > When it comes to starting the apply worker, probably it didn't happen
> > since there are already running tablesync workers as much as
> > max_sync_workers_per_subscription (2 by default):
> >
> > logicalrep_worker_launch():
> >
> > /*
> > * If we reached the sync worker limit per subscription, just exit
> > * silently as we might get here because of an otherwise harmless race
> > * condition.
> > */
> > if (nsyncworkers >= max_sync_workers_per_subscription)
> > {
> > LWLockRelease(LogicalRepWorkerLock);
> > return;
> > }
> >
> > This scenario seems possible in principle but I've not managed to
> > reproduce this issue so I might be wrong.
> >
>
> This is exactly the same analysis I have done in the original thread
> where that patch was committed. I have found some crude ways to
> reproduce it with a different test as well. See emails [1][2][3].

Great. I didn't realize there is a discussion there.

>
> > Especially, according to the
> > server logs, it seems like the tablesync workers were launched before
> > the apply worker restarted due to parameter change and this is a
> > common pattern among other failure logs. But I'm not sure how it could
> > really happen. IIUC the apply worker always re-reads subscription (and
> > exits if there is parameter change) and then requests to launch
> > tablesync workers accordingly.
> >
>
> Is there any rule/documentation which ensures that we must re-read the
> subscription parameter change before trying to launch sync workers?

No, but as far as I read the code I could not find any path of that.

>
> Actually, it would be better if we discuss this problem on another
> thread [1] to avoid hijacking this thread. So, it would be good if you
> respond there with your thoughts. Thanks for looking into this.

Agreed. I'll respond there.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2022-04-13 07:35:39 Re: Skipping schema changes in publication
Previous Message Michael Paquier 2022-04-13 07:10:01 Re: Add --{no-,}bypassrls flags to createuser