Re: Race conditions in 019_replslot_limit.pl

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, hlinnaka(at)iki(dot)fi, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Race conditions in 019_replslot_limit.pl
Date: 2022-03-24 19:05:40
Message-ID: 3042597.1648148740@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I just noticed something very interesting: in a couple of recent
buildfarm runs with this failure, the pg_stat_activity printout
no longer shows the extra walsender:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2022-03-24%2017%3A50%3A10
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=xenodermus&dt=2022-03-23%2011%3A00%3A05

This is just two of the 33 such failures in the past ten days,
so maybe it's not surprising that we didn't see it already.
(I got bored before looking back further than that.)

What this suggests to me is that maybe the extra walsender is
indeed not blocked on anything, but is just taking its time
about exiting. In these two runs, as well as in all the
non-failing runs, it had enough time to do so.

I suggest that we add a couple-of-seconds sleep in front of
the query that collects walsender PIDs, and maybe a couple more
seconds before the pg_stat_activity probe in the failure path,
and see if the behavior changes at all. That should be enough
to confirm or disprove this idea pretty quickly. If it is
right, a permanent fix could be to wait for the basebackup's
walsender to disappear from node_primary3's pg_stat_activity
before we start the one for node_standby3.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2022-03-24 19:06:21 Re: Granting SET and ALTER SYSTE privileges for GUCs
Previous Message Andres Freund 2022-03-24 18:47:58 Re: pg_walinspect - a new extension to get raw WAL data and WAL stats