Re: ERROR: too many dynamic shared memory segments

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Jakub Glapa <jakub(dot)glapa(at)gmail(dot)com>
Cc: Forums postgresql <pgsql-general(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: ERROR: too many dynamic shared memory segments
Date: 2017-11-27 22:48:54
Message-ID: CAEepm=0kADK5inNf_KuemjX=HQ=PuTP0DykM--fO5jS5ePVFEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Tue, Nov 28, 2017 at 10:05 AM, Jakub Glapa <jakub(dot)glapa(at)gmail(dot)com> wrote:
> As for the crash. I dug up the initial log and it looks like a segmentation
> fault...
>
> 2017-11-23 07:26:53 CET:192.168.10.83(35238):user(at)db:[30003]: ERROR: too
> many dynamic shared memory segments

Hmm. Well this error can only occur in dsm_create() called without
DSM_CREATE_NULL_IF_MAXSEGMENTS. parallel.c calls it with that flag
and dsa.c doesn't (perhaps it should, not sure, but that'd just change
the error message), so that means this the error arose from dsa.c
trying to get more segments. That would be when Parallel Bitmap Heap
Scan tried to allocate memory.

I hacked my copy of PostgreSQL so that it allows only 5 DSM slots and
managed to reproduce a segv crash by trying to run concurrent Parallel
Bitmap Heap Scans. The stack looks like this:

* frame #0: 0x00000001083ace29
postgres`alloc_object(area=0x0000000000000000, size_class=10) + 25 at
dsa.c:1433
frame #1: 0x00000001083acd14
postgres`dsa_allocate_extended(area=0x0000000000000000, size=72,
flags=4) + 1076 at dsa.c:785
frame #2: 0x0000000108059c33
postgres`tbm_prepare_shared_iterate(tbm=0x00007f9743027660) + 67 at
tidbitmap.c:780
frame #3: 0x0000000108000d57
postgres`BitmapHeapNext(node=0x00007f9743019c88) + 503 at
nodeBitmapHeapscan.c:156
frame #4: 0x0000000107fefc5b
postgres`ExecScanFetch(node=0x00007f9743019c88,
accessMtd=(postgres`BitmapHeapNext at nodeBitmapHeapscan.c:77),
recheckMtd=(postgres`BitmapHeapRecheck at nodeBitmapHeapscan.c:710)) +
459 at execScan.c:95
frame #5: 0x0000000107fef983
postgres`ExecScan(node=0x00007f9743019c88,
accessMtd=(postgres`BitmapHeapNext at nodeBitmapHeapscan.c:77),
recheckMtd=(postgres`BitmapHeapRecheck at nodeBitmapHeapscan.c:710)) +
147 at execScan.c:162
frame #6: 0x00000001080008d1
postgres`ExecBitmapHeapScan(pstate=0x00007f9743019c88) + 49 at
nodeBitmapHeapscan.c:735

(lldb) f 3
frame #3: 0x0000000108000d57
postgres`BitmapHeapNext(node=0x00007f9743019c88) + 503 at
nodeBitmapHeapscan.c:156
153 * dsa_pointer of the iterator state which will be used by
154 * multiple processes to iterate jointly.
155 */
-> 156 pstate->tbmiterator = tbm_prepare_shared_iterate(tbm);
157 #ifdef USE_PREFETCH
158 if (node->prefetch_maximum > 0)
159
(lldb) print tbm->dsa
(dsa_area *) $3 = 0x0000000000000000
(lldb) print node->ss.ps.state->es_query_dsa
(dsa_area *) $5 = 0x0000000000000000
(lldb) f 17
frame #17: 0x000000010800363b
postgres`ExecGather(pstate=0x00007f9743019320) + 635 at
nodeGather.c:220
217 * Get next tuple, either from one of our workers, or by running the plan
218 * ourselves.
219 */
-> 220 slot = gather_getnext(node);
221 if (TupIsNull(slot))
222 return NULL;
223
(lldb) print *node->pei
(ParallelExecutorInfo) $8 = {
planstate = 0x00007f9743019640
pcxt = 0x00007f97450001b8
buffer_usage = 0x0000000108b7e218
instrumentation = 0x0000000108b7da38
area = 0x0000000000000000
param_exec = 0
finished = '\0'
tqueue = 0x0000000000000000
reader = 0x0000000000000000
}
(lldb) print *node->pei->pcxt
warning: could not load any Objective-C class information. This will
significantly reduce the quality of type information available.
(ParallelContext) $9 = {
node = {
prev = 0x000000010855fb60
next = 0x000000010855fb60
}
subid = 1
nworkers = 0
nworkers_launched = 0
library_name = 0x00007f9745000248 "postgres"
function_name = 0x00007f9745000268 "ParallelQueryMain"
error_context_stack = 0x0000000000000000
estimator = (space_for_chunks = 180352, number_of_keys = 19)
seg = 0x0000000000000000
private_memory = 0x0000000108b53038
toc = 0x0000000108b53038
worker = 0x0000000000000000
}

I think there are two failure modes: one of your sessions showed the
"too many ..." error (that's good, ran out of slots and said so and
our error machinery worked as it should), and another crashed with a
segfault, because it tried to use a NULL "area" pointer (bad). I
think this is a degenerate case where we completely failed to launch
parallel query, but we ran the parallel query plan anyway and this
code thinks that the DSA is available. Oops.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Robert Lakes 2017-11-27 23:48:07 Setting a serial column with serial object that has a name that is built dynamically
Previous Message Jakub Glapa 2017-11-27 21:05:39 Re: ERROR: too many dynamic shared memory segments

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-11-27 22:58:23 Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables
Previous Message Alexander Korotkov 2017-11-27 21:41:19 Re: [HACKERS] Challenges preventing us moving to 64 bit transaction id (XID)?