walsender performance regression due to logical decoding on standby changes

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: walsender performance regression due to logical decoding on standby changes
Date: 2023-05-09 19:02:47
Message-ID: 20230509190247.3rrplhdgem6su6cg@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Unfortunately I have found the following commit to have caused a performance
regression:

commit e101dfac3a53c20bfbf1ca85d30a368c2954facf
Author: Andres Freund <andres(at)anarazel(dot)de>
Date: 2023-04-08 00:24:24 -0700

For cascading replication, wake physical and logical walsenders separately

Physical walsenders can't send data until it's been flushed; logical
walsenders can't decode and send data until it's been applied. On the
standby, the WAL is flushed first, which will only wake up physical
walsenders; and then applied, which will only wake up logical
walsenders.

Previously, all walsenders were awakened when the WAL was flushed. That
was fine for logical walsenders on the primary; but on the standby the
flushed WAL would have been not applied yet, so logical walsenders were
awakened too early.

Per idea from Jeff Davis and Amit Kapila.

Author: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
Reviewed-By: Jeff Davis <pgsql(at)j-davis(dot)com>
Reviewed-By: Robert Haas <robertmhaas(at)gmail(dot)com>
Reviewed-by: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Reviewed-by: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Discussion: https://postgr.es/m/CAA4eK1+zO5LUeisabX10c81LU-fWMKO4M9Wyg1cdkbW7Hqh6vQ@mail.gmail.com

The problem is that, on a standby, after the change - as needed to for the
approach to work - the call to WalSndWakeup() in ApplyWalRecord() happens for
every record, instead of only happening when the timeline is changed (or WAL
is flushed or ...).

WalSndWakeup() iterates over all walsender slots, regardless of whether in
use. For each of the walsender slots it acquires a spinlock.

When replaying a lot of small-ish WAL records I found the startup process to
spend the majority of the time in WalSndWakeup(). I've not measured it very
precisely yet, but the overhead is significant (~35% slowdown), even with the
default max_wal_senders. If that's increased substantially, it obviously gets
worse.

The only saving grace is that this is not an issue on the primary.

I unfortunately spent less time on this commit of the
logical-decoding-on-standby series than on the others. There were so many
other senior contributors discussing it, that I "relaxed" a bit too much.

I don't think the approach of not having any sort of "registry" of whether
anybody is waiting for the replay position to be updated is
feasible. Iterating over all walsenders slots is just too expensive -
WalSndWakeup() shows up even if I remove the spinlock (which we likely could,
especially when just checking if the the walsender is connected).

My current guess is that mis-using a condition variable is the best bet. I
think it should work to use ConditionVariablePrepareToSleep() before a
WalSndWait(), and then ConditionVariableCancelSleep(). I.e. to never use
ConditionVariableSleep(). The latch set from ConditionVariableBroadcast()
would still cause the necessary wakeup.

Greetings,

Andres Freund

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2023-05-09 20:38:24 Re: walsender performance regression due to logical decoding on standby changes
Previous Message Pavel Stehule 2023-05-09 18:31:23 Re: psql tests hangs