walsender waiting_for_ping spuriously set

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: ashutosh(dot)bapat(at)2ndquadrant(dot)com
Subject: walsender waiting_for_ping spuriously set
Date: 2020-08-06 22:55:58
Message-ID: 20200806225558.GA22401@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ashutosh Bapat noticed that WalSndWaitForWal() is setting
waiting_for_ping_response after sending a keepalive that does *not*
request a reply. The bad consequence is that other callers that do
require a reply end up in not sending a keepalive, because they think it
was already sent previously. So the whole thing gets stuck.

He found that commit 41d5f8ad734 failed to remove the setting of
waiting_for_ping_response after changing the "request" parameter
WalSndKeepalive from true to false; that seems to have been an omission
and it breaks the algorithm. Thread at [1].

The simplest fix is just to remove the line that sets
waiting_for_ping_response, but I think it is less error-prone to have
WalSndKeepalive set the flag itself, instead of expecting its callers to
do it (and know when to). Patch attached. Also rewords some related
commentary.

[1] https://postgr.es/m/flat/BLU436-SMTP25712B7EF9FC2ADEB87C522DC040(at)phx(dot)gbl

--
Álvaro Herrera Valdivia, Chile

Attachment Content-Type Size
0001-Fix-waiting_for_ping-in-walsender.patch text/x-diff 3.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2020-08-07 00:02:50 Should the nbtree page split REDO routine's locking work more like the locking on the primary?
Previous Message David Rowley 2020-08-06 22:24:09 Re: pg13dev: explain partial, parallel hashagg, and memory use