ReplicationSlotsComputeRequiredXmin seems pretty questionable

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: ReplicationSlotsComputeRequiredXmin seems pretty questionable
Date: 2020-08-20 16:58:32
Message-ID: 1606337.1597942712@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While trying to make sense of Adam Sjøgren's problem [1], I found
myself staring at ReplicationSlotsComputeRequiredXmin() in slot.c.
It seems to me that that is very shaky code, on two different
grounds:

1. Sometimes it's called with ProcArrayLock already held exclusively.
This means that any delay in acquiring the ReplicationSlotControlLock
translates directly into a hold on ProcArrayLock; in other words,
every acquisition of the ReplicationSlotControlLock is just as bad
for concurrency as an acquisition of ProcArrayLock. While I didn't
see any places that were doing really obviously slow things while
holding ReplicationSlotControlLock, this is disturbing. Do we really
need it to be like that?

2. On the other hand, the code is *releasing* the
ReplicationSlotControlLock before it calls
ProcArraySetReplicationSlotXmin, and that seems like a flat out
concurrency bug. How can we be sure that the values we're storing
into the shared xmin fields aren't stale by the time we acquire
the ProcArrayLock (in the case where we didn't hold it already)?
I'm concerned that in the worst case this code could make the
shared xmin fields go backwards.

Both of these issues could be solved, I think, if we got rid of
the provision for calling with ProcArrayLock already held and
moved the ProcArraySetReplicationSlotXmin call inside the hold
of ReplicationSlotControlLock. But maybe I'm missing something
about why that would be worse.

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/87364kdsim.fsf%40tullinup.koldfront.dk

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-08-20 17:54:02 Re: "ccold" left by reindex concurrently are droppable?
Previous Message Fujii Masao 2020-08-20 15:41:34 Re: [PG13] Planning (time + buffers) data structure in explain plan (format text)