Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug

From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Thom Brown <thom(at)linux(dot)com>
Cc: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, PostgreSQL General <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
Date: 2025-08-21 12:03:02
Message-ID: aKcK9vxLVldxFDT2@depesz.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Aug 21, 2025 at 12:41:44PM +0100, Thom Brown wrote:
> Ah, yeah I meant transparent hugepage:
> cat /sys/kernel/mm/transparent_hugepage/enabled
> This should show it being set as "never".

Ah. Sorry, couldn't decipher. Yes, it's "never".

> > # grep -oP '^2025-08-19 22:09:2\d\.\d+ UTC' postgresql-2025-08-19_220000.csv | uniq -c | grep -C3 -P '^\s*\d\d'
> > 2 2025-08-19 22:09:29.084 UTC
> > 1 2025-08-19 22:09:29.094 UTC
> > 2 2025-08-19 22:09:29.097 UTC
> > 70 2025-08-19 22:09:29.109 UTC
> > 90 2025-08-19 22:09:29.110 UTC
> > 6 2025-08-19 22:09:29.111 UTC
> > 1 2025-08-19 22:09:29.153 UTC
> > 1 2025-08-19 22:09:29.555 UTC

> > 22:10:54 all 2.41 0.00 0.28 0.22 0.00 0.10 0.00 0.00 0.00 96.99
> > 22:10:59 all 2.83 0.00 0.29 0.19 0.00 0.12 0.00 0.00 0.00 96.57
>
> This output looks fine, so it doesn't show anything concerning, so
> suggests the issue is somehow on the Postgres side.
>
> Did you happen to poll pg_stat_activity at the time to see whether you
> had lots of IPC waits? I'm wondering whether the storage layer is
> freezing up for a moment.

So, we get select * from pg_stat_activity, for client backends that are
not idle, every 29 seconds.
So, 1 second "freeze" is impossible to cathc.

Plus - I suspect that if I ran select * from pg_stat_activity while "in
freeze", it would also get frozen.

Anyway, I have data from 22:09:22 and 22:09:51. In both cases only
4 non-idle backend.

6 of them had NULL in wait_event*

one was Client/ClientRead and one was IPC/BgWorkerShutdown.

State_change for the IPC/BgWorkerShutdown backend was 2025-08-19
22:09:51.79504+00 so it was well past the moment when the problem
struck.

Best regards,

depesz

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2025-08-21 15:04:25 Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
Previous Message Thom Brown 2025-08-21 11:41:44 Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug