Re: Race condition in InvalidateObsoleteReplicationSlots()

From: Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Fujii Masao <fujii(at)postgresql(dot)org>
Subject: Re: Race condition in InvalidateObsoleteReplicationSlots()
Date: 2021-06-10 22:02:58
Message-ID: 202106102202.mjw4huiix7lo@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here's a version that I feel is committable (0001). There was an issue
when returning from the inner loop, if in a previous iteration we had
released the lock. In that case we need to return with the lock not
held, so that the caller can acquire it again, but weren't. This seems
pretty hard to hit in practice (I suppose somebody needs to destroy the
slot just as checkpointer killed the walsender, but before checkpointer
marks it as its own) ... but if it did happen, I think checkpointer
would block with no recourse. Also added some comments and slightly
restructured the code.

There are plenty of conflicts in pg13, but it's all easy to handle.

I wrote a test (0002) to cover the case of signalling a walsender, which
is currently not covered (we only deal with the case of a standby that's
not running). There are some sharp edges in this code -- I had to make
it use background_psql() to send a CHECKPOINT, which hangs, because I
previously send a SIGSTOP to the walreceiver. Maybe there's a better
way to achieve a walreceiver that remains connected but doesn't consume
input from the primary, but I don't know what it is. Anyway, the code
becomes covered with this. I would like to at least see it in master,
to gather some reactions from buildfarm.

--
Álvaro Herrera Valdivia, Chile
<Schwern> It does it in a really, really complicated way
<crab> why does it need to be complicated?
<Schwern> Because it's MakeMaker.

Attachment Content-Type Size
0001-the-code-fix.patch text/x-diff 7.9 KB
0002-the-test.patch text/x-diff 3.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2021-06-10 22:04:16 Re: unnesting multirange data types
Previous Message Dean Gibson (DB Administrator) 2021-06-10 21:46:27 Re: AWS forcing PG upgrade from v9.6 a disaster