Re: BUG #15585: infinite DynamicSharedMemoryControlLock waiting in parallel query

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Sergei Kornilov <sk(at)zsrv(dot)org>
Cc: Maksim Milyutin <milyutinma(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15585: infinite DynamicSharedMemoryControlLock waiting in parallel query
Date: 2019-01-24 19:56:00
Message-ID: CAEepm=3ynb5nBhKQRts0bNETA1HzNxz6-3RTPOzCbM8oQ9yPdg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Jan 24, 2019 at 11:56 PM Sergei Kornilov <sk(at)zsrv(dot)org> wrote:
> We should not call dsm_backend_shutdown twice in same process, right? So we tried call dsm_detach on same segment 0x5624578710c8 twice, but this is unexpected behavior and refcnt would be incorrect. And seems we can not LWLockAcquire lock and then LWLockAcquire same lock again without release. And here we have infinite waiting.

Yeah, I think your analysis is right. It shouldn't do so while
holding the lock. dsm_unpin_segment() should perhaps release it
before it raises an error, something like:

diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index 36904d2676..b989c0b94a 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -924,9 +924,15 @@ dsm_unpin_segment(dsm_handle handle)
* called on a segment which is pinned.
*/
if (control_slot == INVALID_CONTROL_SLOT)
+ {
+ LWLockRelease(DynamicSharedMemoryControlLock);
elog(ERROR, "cannot unpin unknown segment handle");
+ }
if (!dsm_control->item[control_slot].pinned)
+ {
+ LWLockRelease(DynamicSharedMemoryControlLock);
elog(ERROR, "cannot unpin a segment that is not pinned");
+ }
Assert(dsm_control->item[control_slot].refcnt > 1);

/*

I have contemplated that before, but not done it because I'm not sure
about the state of the system after that; we just shouldn't be in this
situation, because if we are, it means that we can error out when
later segments (in the array dsa_release_in_place() loops through)
remain pinned forever and we'll leak memory and run out of DSM slots.
Segment pinning is opting out of resource owner control, which means
the client code is responsible for not screwing it up. Perhaps that
suggests we should PANIC, or perhaps just LOG and continue, but I'm
not sure.

I think the root cause is earlier and in a different process (see
ProcessInterrupt() in the stack). Presumably one that reported
"dsa_area could not attach to segment" is closer to the point where
things go wrong. If you are in a position to reproduce this on a
modified source tree, it'd be good to see the back trace for that, to
figure out which of a couple of possible code paths reach it. Perhaps
you could do that by enabling core files and changing this:

- elog(ERROR, "dsa_area could not attach to segment");
+ elog(PANIC, "dsa_area could not attach to segment");

I have so far not succeeded in reaching that condition.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Patrick Headley 2019-01-24 20:28:25 Re: How duplicate values inserted into the primary key column of table and how to fix it
Previous Message Adrian Klaver 2019-01-24 15:30:19 Re: How duplicate values inserted into the primary key column of table and how to fix it