From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: walsender "wakeup storm" on PG16, likely because of bc971f4025c (Optimize walsender wake up logic using condition variables) |
Date: | 2023-08-11 17:51:11 |
Message-ID: | 20230811175111.3h7pwmezbvoglb5t@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2023-08-11 15:31:43 +0200, Tomas Vondra wrote:
> That's an awful lot of CPU for a cluster doing essentially "nothing"
> (there's no WAL to decode/replicate, etc.). It usually disappears after
> a couple seconds, but sometimes it's a rather persistent state.
Ugh, that's not great.
> The profile from the walsender processes looks like this:
>
> --99.94%--XLogSendLogical
> |
> |--99.23%--XLogReadRecord
> | XLogReadAhead
> | XLogDecodeNextRecord
> | ReadPageInternal
> | logical_read_xlog_page
> | |
> | |--97.80%--WalSndWaitForWal
> | | |
> | | |--68.48%--WalSndWait
>
> It seems to me the issue is in WalSndWait, which was reworked to use
> ConditionVariableCancelSleep() in bc971f4025c. The walsenders end up
> waking each other in a busy loop, until the timing changes just enough
> to break the cycle.
IMO ConditionVariableCancelSleep()'s behaviour of waking up additional
processes can nearly be considered a bug, at least when combined with
ConditionVariableBroadcast(). In that case the wakeup is never needed, and it
can cause situations like this, where condition variables basically
deteriorate to a busy loop.
I hit this with AIO as well. I've been "solving" it by adding a
ConditionVariableCancelSleepEx(), which has a only_broadcasts argument.
I'm inclined to think that any code that needs that needs the forwarding
behaviour is pretty much buggy.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2023-08-11 17:57:23 | Re: AssertLog instead of Assert in some places |
Previous Message | Andres Freund | 2023-08-11 17:43:44 | Re: [PATCH] Support static linking against LLVM |