walreceiver that is behind doesn't quit, send replies

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: walreceiver that is behind doesn't quit, send replies
Date: 2021-05-11 02:27:55
Message-ID: 20210511022755.wcb5h4czadxsgewt@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


There are no interrupt checks in the WalReceiverMain() sub-loop for
receiving WAL. There's one above

/* See if we can read data immediately */
len = walrcv_receive(wrconn, &buf, &wait_fd);

but none in the loop below:
* Process the received data, and any subsequent data we
* can read without blocking.
for (;;)

Similarly, that inner loop doesn't send status updates or fsyncs, while
there's network data - but that matters a bit less, because we'll
sendstatus updates upon request, and flush WAL at segment boundaries.

This may explain why a low-ish wal_sender_timeout /
wal_receiver_status_interval combo still sees plenty timeouts.

I suspect this is a lot easier to hit when the IO system on the standby
is the bottleneck (with the kernel slowing us down inside the
pg_pwrite()), because that makes it easier to always have incoming
network data.

It's probably not a good idea to just remove that two-level loop - we
don't want to fsync at a much higher rate. But just putting an
ProcessWalRcvInterrupts() in the inner loop also seems unsatisfying, we
should respect wal_receiver_status_interval...

I've a couple times gotten into a situation where I was shutting down
the primary while the standby was behind, and the system appeared to
just lock up, with neither primary nor standby reacting to normal
shutdown attempts. This seems to happen more often with larger wal
segment size...


Andres Freund


Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2021-05-11 02:34:45 Re: PG 14 release notes, first draft
Previous Message Bruce Momjian 2021-05-11 02:18:18 Re: PG 14 release notes, first draft