Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug

From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, PostgreSQL General <pgsql-general(at)lists(dot)postgresql(dot)org>, Chris Wilson <chris+google(at)qwirx(dot)com>
Subject: Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
Date: 2025-08-22 15:30:22
Message-ID: aKiNDmLNsNe0OEio@depesz.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Aug 22, 2025 at 11:21:22AM -0400, Tom Lane wrote:
> hubert depesz lubaczewski <depesz(at)depesz(dot)com> writes:
> > I got repeatable case today. Is is breaking on its own everyy
> > ~ 5 minutes.
>
> Interesting. That futex call is presumably caused by interaction
> with some other process within the standby server, and the only
> plausible candidate really is the startup process (which is replaying
> WAL received from the primary). There are cases where WAL replay
> will take locks that can block queries on the standby. Can you
> correlate the delays on the standby server with any DDL events
> occurring on the primary?

Nope. Plus there is certain repetition of these cases, so even if I'd
miss *some* create table/alter, it just isn't going to be happening
every 4-5 minutes.

For example, looking at logs for the last ~2h, and just checking
situation when there are more than 20 messages in the same milisecond,
I can see:

108 14:02:03.149
25 14:04:01.619
110 14:05:36.924
77 14:05:36.925
108 14:09:28.155
38 14:13:52.481
63 14:13:52.482
73 14:13:52.484
146 14:18:19.338
39 14:18:19.339
24 14:20:01.694
82 14:23:07.352
55 14:23:07.353
37 14:23:07.353
45 14:27:44.125
132 14:27:44.126
109 14:31:41.593
70 14:31:41.594
24 14:32:01.205
21 14:34:01.477
79 14:35:36.761
104 14:35:36.762
22 14:39:49.541
151 14:39:49.542
22 14:39:49.543
112 14:44:15.607
73 14:44:15.608
28 14:48:01.256
50 14:48:25.588
131 14:48:25.589
139 14:52:44.391
74 14:57:02.369
117 14:57:02.370
20 15:00:02.008
137 15:00:43.982
34 15:00:43.983
20 15:01:01.110
22 15:04:21.037
153 15:04:21.038
20 15:08:01.136
31 15:08:55.798
126 15:08:55.799
76 15:13:46.654
83 15:13:46.655
20 15:17:01.700
107 15:18:42.112
72 15:18:42.113
124 15:23:48.689
32 15:23:48.690
25 15:23:48.690
28 15:24:01.397

So, while there are outliers, I'd say that most of the problems happens every
3-5 minutes.

depesz

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2025-08-22 15:39:21 Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
Previous Message Tom Lane 2025-08-22 15:21:22 Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug