walsender "wakeup storm" on PG16, likely because of bc971f4025c (Optimize walsender wake up logic using condition variables)

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Andres Freund <andres(at)anarazel(dot)de>
Subject: walsender "wakeup storm" on PG16, likely because of bc971f4025c (Optimize walsender wake up logic using condition variables)
Date: 2023-08-11 13:31:43
Message-ID: 2840876b-4cfe-240f-0a7e-29ffd66711e7@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

while working on some logical replication stuff, I noticed that on PG16
I often end up with a completely idle publisher (no user activity), that
however looks like this in top:

... %CPU COMMAND

... 17.9 postgres: walsender user test ::1(43064) START_REPLICATION

... 16.6 postgres: walsender user test ::1(43128) START_REPLICATION

... 15.0 postgres: walsender user test ::1(43202) START_REPLICATION

... 6.6 postgres: walsender user test ::1(43236) START_REPLICATION

... 5.3 postgres: walsender user test ::1(43086) START_REPLICATION

... 4.3 postgres: walsender user test ::1(43180) START_REPLICATION

... 3.7 postgres: walsender user test ::1(43052) START_REPLICATION

... 3.7 postgres: walsender user test ::1(43158) START_REPLICATION

... 3.0 postgres: walsender user test ::1(43108) START_REPLICATION

... 3.0 postgres: walsender user test ::1(43214) START_REPLICATION

That's an awful lot of CPU for a cluster doing essentially "nothing"
(there's no WAL to decode/replicate, etc.). It usually disappears after
a couple seconds, but sometimes it's a rather persistent state.

The profile from the walsender processes looks like this:

--99.94%--XLogSendLogical
|
|--99.23%--XLogReadRecord
| XLogReadAhead
| XLogDecodeNextRecord
| ReadPageInternal
| logical_read_xlog_page
| |
| |--97.80%--WalSndWaitForWal
| | |
| | |--68.48%--WalSndWait

It seems to me the issue is in WalSndWait, which was reworked to use
ConditionVariableCancelSleep() in bc971f4025c. The walsenders end up
waking each other in a busy loop, until the timing changes just enough
to break the cycle.

I've been unable to reproduce this on PG15, and bc971f4025c seems like
the most significant change to WalSndWait, which is why I suspect it's
related to the issue.

Reproducing this is simple, create a publisher with multiple subscribers
(could even go to the same subscriber instance) and empty publications.
If you trigger a "noop" it's likely to cause this high memory usage:

---------------------------------------------------------------------
# init two clusters
pg_ctl -D data-publisher init
pg_ctl -D data-subscriber init

# change the parameters to allow 10 subscriptions
echo 'wal_level = logical' >> data-publisher/postgresql.conf
echo 'port = 5433' >> data-subscriber/postgresql.conf
echo 'max_worker_processes = 20' >> data-subscriber/postgresql.conf
echo 'max_logical_replication_workers = 20' >>
data-subscriber/postgresql.conf

# setup empty publication
createdb test
psql test -c "create publication p";

# setup 10 subscriptions
for i in `seq 1 10`; do
createdb -p 5433 test$i
psql -p 5433 test$i -c "create subscription s$i CONNECTION
'host=localhost port=5432 dbname=test' publication p";
done

# emit logical messages, which are almost noop, 5s apart
for i in `seq 1 10`; do
psql test -c "select pg_logical_emit_message(false, 'x', 'x')";
sleep 5;
done;
---------------------------------------------------------------------

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melih Mutlu 2023-08-11 13:45:39 Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Previous Message Ashutosh Bapat 2023-08-11 13:20:52 Re: Report planning memory in EXPLAIN ANALYZE