RE: walsender performance regression due to logical decoding on standby changes

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Subject: RE: walsender performance regression due to logical decoding on standby changes
Date: 2023-05-10 10:11:26
Message-ID: OS0PR01MB5716CADAD77D0D720E27607A94779@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Wednesday, May 10, 2023 2:36 PM Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Wed, May 10, 2023 at 12:33 AM Andres Freund <andres(at)anarazel(dot)de>
> wrote:
> >
> > Unfortunately I have found the following commit to have caused a
> performance
> > regression:
> >
> > commit e101dfac3a53c20bfbf1ca85d30a368c2954facf
> >
> > The problem is that, on a standby, after the change - as needed to for the
> > approach to work - the call to WalSndWakeup() in ApplyWalRecord()
> happens for
> > every record, instead of only happening when the timeline is changed (or
> WAL
> > is flushed or ...).
> >
> > WalSndWakeup() iterates over all walsender slots, regardless of whether in
> > use. For each of the walsender slots it acquires a spinlock.
> >
> > When replaying a lot of small-ish WAL records I found the startup process to
> > spend the majority of the time in WalSndWakeup(). I've not measured it very
> > precisely yet, but the overhead is significant (~35% slowdown), even with the
> > default max_wal_senders. If that's increased substantially, it obviously gets
> > worse.
>
> I played it with a simple primary -> standby1 -> standby2 setup. I ran
> a pgbench script [1] on primary and counted the number of times
> WalSndWakeup() gets called from ApplyWalRecord() and the number of
> times spinlock is acquired/released in WalSndWakeup(). It's a whopping
> 21 million times spinlock is acquired/released on the standby 1 and
> standby 2 for just a < 5min of pgbench run on the primary:
>
> standby 1:
> 2023-05-10 05:32:43.249 UTC [1595600] LOG: FOO WalSndWakeup() in
> ApplyWalRecord() was called 2176352 times
> 2023-05-10 05:32:43.249 UTC [1595600] LOG: FOO spinlock
> acquisition/release count in WalSndWakeup() is 21763530
>
> standby 2:
> 2023-05-10 05:32:43.249 UTC [1595625] LOG: FOO WalSndWakeup() in
> ApplyWalRecord() was called 2176352 times
> 2023-05-10 05:32:43.249 UTC [1595625] LOG: FOO spinlock
> acquisition/release count in WalSndWakeup() is 21763530
>
> In this case, there is no timeline switch or no logical decoding on
> the standby or such, but WalSndWakeup() gets called because the
> standby can't make out if the slot is for logical or physical
> replication unless spinlock is acquired. Before e101dfac3a,
> WalSndWakeup() was getting called only when there was a timeline
> switch.
>
> > The only saving grace is that this is not an issue on the primary.
>
> Yeah.
>
> > I don't think the approach of not having any sort of "registry" of whether
> > anybody is waiting for the replay position to be updated is
> > feasible. Iterating over all walsenders slots is just too expensive -
> > WalSndWakeup() shows up even if I remove the spinlock (which we likely
> could,
> > especially when just checking if the the walsender is connected).
>
> Right.
>
> > My current guess is that mis-using a condition variable is the best bet. I
> > think it should work to use ConditionVariablePrepareToSleep() before a
> > WalSndWait(), and then ConditionVariableCancelSleep(). I.e. to never use
> > ConditionVariableSleep(). The latch set from ConditionVariableBroadcast()
> > would still cause the necessary wakeup.
>
> How about something like the attached? Recovery and subscription tests
> don't complain with the patch.

Thanks for the patch. I noticed one place where the logic is different from before and
just to confirm:

if (AllowCascadeReplication())
- WalSndWakeup(switchedTLI, true);
+ ConditionVariableBroadcast(&WalSndCtl->cv);

After the change, we wakeup physical walsender regardless of switchedTLI flag.
Is this intentional ? if so, I think It would be better to update the comments above this.

Best Regards,
Hou zj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-05-10 10:33:07 Re: walsender performance regression due to logical decoding on standby changes
Previous Message Drouvot, Bertrand 2023-05-10 09:52:35 Re: walsender performance regression due to logical decoding on standby changes