Re: ERROR: too many dynamic shared memory segments

From: Jakub Glapa <jakub(dot)glapa(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Forums postgresql <pgsql-general(at)postgresql(dot)org>
Subject: Re: ERROR: too many dynamic shared memory segments
Date: 2017-11-27 21:05:39
Message-ID: CAJk1zg0MPjTL1=NQ4a3PcYnc6jzfk7ZqZ91Wn+Sam9EZmbEYMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hi Thomas,
doubling the max_connection has the problem gone away for now! Yay!

As for the crash. I dug up the initial log and it looks like a segmentation
fault...

2017-11-23 07:26:53 CET:192.168.10.83(35238):user(at)db:[30003]: ERROR: too
many dynamic shared memory segments
2017-11-23 07:26:53 CET:192.168.10.83(35238):user(at)db:[30003]: STATEMENT:
SELECT ....
2017-11-23 07:26:55 CET::@:[29398]: LOG: server process (PID 26992) was
terminated by signal 11: Segmentation fault
2017-11-23 07:26:55 CET::@:[29398]: DETAIL: Failed process was running:
SELECT .....
2017-11-23 07:26:55 CET::@:[29398]: LOG: terminating any other active
server processes
2017-11-23 07:26:55 CET:192.168.10.83(35242):user(at)db:[30009]: WARNING:
terminating connection because of crash of another server process
2017-11-23 07:26:55 CET:192.168.10.83(35242):user(at)db:[30009]: DETAIL: The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
2017-11-23 07:26:55 CET:192.168.10.83(35242):user(at)db:[30009]: HINT: In a
moment you should be able to reconnect to the database and repeat your
command.
2017-11-23 07:26:55 CET:192.168.10.83(35268):user(at)db:[30097]: WARNING:
terminating connection because of crash of another server process
2017-11-23 07:26:55 CET:192.168.10.83(35268):user(at)db:[30097]: DETAIL: The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
2017-11-23 07:26:55 CET:192.168.10.83(35268):user(at)db:[30097]: HINT: In a
moment you should be able to reconnect to the database and repeat your
command.
2017-11-23 07:26:55 CET:192.168.10.83(46614):user(at)db:[31080]: WARNING:
terminating connection because of crash of another server process
2017-11-23 07:26:55 CET:192.168.10.83(46614):user(at)db:[31080]: DETAIL: The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
2017-11-23 07:26:55 CET:192.168.10.83(46614):user(at)db: HINT: In a moment
you should be able to reconnect to the database and repeat your command.
2017-11-23 07:26:55 CET:192.168.10.83(35238):user(at)db:[30003]: WARNING:
terminating connection because of crash of another server process
2017-11-23 07:26:55 CET:192.168.10.83(35238):user(at)db:[30003]: DETAIL: The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory
2017-11-23 07:26:55 CET:192.168.10.83(35300):user(at)db:[30152]: FATAL: the
database system is in recovery mode

--
regards,
pozdrawiam,
Jakub Glapa

On Mon, Nov 27, 2017 at 7:53 PM, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com
> wrote:

> On Tue, Nov 28, 2017 at 1:13 AM, Jakub Glapa <jakub(dot)glapa(at)gmail(dot)com>
> wrote:
> > The queries are somehow special.
> > We are still using the old style partitioning (list type) but we abuse
> it a
> > bit when querying.
> > When querying a set of partitions instead of doing it via parent table we
> > stitch together the required tables with UNION ALL (this requires less
> > locking) and was more performant in our benchmark (the new native
> > partitioning might improve this but we didn't research that yet).
> >
> > The queries are in form of
> > SELECT col1,col2,col3 FROM
> > (SELECT *
> > FROM par1
> > WHERE rangestart >= '2017-01-01 00:00:00.0' AND rangeend <=
> > '2017-11-26 23:59:59.999'
> > UNION ALL SELECT *
> > FROM par2
> > WHERE rangestart >= '2017-01-01 00:00:00.0' AND rangeend
> <=
> > '2017-11-26 23:59:59.999'
> > UNION ALL SELECT *
> > FROM par2
> > WHERE rangestart >= '2017-01-01 00:00:00.0' AND rangeend
> <=
> > '2017-11-26 23:59:59.999'
> >
> > WHERE rangestart >= '2017-01-01 00:00:00.0' AND rangeend <=
> > '2017-11-26 23:59:59.999'
> >
> > UNION ALL
> > ...
> > ) unionalias
> > WHERE ....
> >
> >
> >
> > and indeed the query planner shows the usage of Bitmap Heap Scan
> > ....
> > -> Subquery Scan on "*SELECT* 2"
> > (cost=3068.58..19793.94 rows=1 width=1284)
> > -> Gather
> (cost=3068.58..19793.93
> > rows=1 width=5014)
> > Workers Planned: 2
> > -> Parallel Bitmap Heap
> Scan
> > on par_6 (cost=2068.58..18793.83 rows=1 width=5014)
> > Recheck Cond:
> > <<CONDITION>>
> > Filter: <<CONDITION>>
> > -> BitmapAnd
> > (cost=2068.58..2068.58 rows=30955 width=0)
> > -> BitmapOr
> > (cost=999.30..999.30 rows=42989 width=0)
> > -> Bitmap
> > Index Scan on par_6_datasource (cost=0.00..990.21 rows=42922 width=0)
> >
> Index
> > Cond: ((datasource)::text = 'one'::text)
> > -> Bitmap
> > Index Scan on par_6_datasource (cost=0.00..4.30 rows=1 width=0)
> >
> Index
> > Cond: ((datasource)::text = 'two'::text)
> > -> Bitmap
> > Index Scan on par_6_datasource (cost=0.00..4.79 rows=67 width=0)
> >
> Index
> > Cond: ((datasource)::text = 'three'::text)
> > -> Bitmap Index
> > Scan on par_6_rangestart (cost=0.00..1069.02 rows=47564 width=0)
> > Index
> Cond:
> > (rangestart >= '2017-01-01 00:00:00+01'::timestamp with time zone)
> > -> Subquery Scan on "*SELECT* 3"
> > (cost=761.33..7944.99 rows=1 width=1086)
> > -> Bitmap Heap Scan on par_7
> > (cost=761.33..7944.98 rows=1 width=4816)
> > Recheck Cond:
> > <<CONDITION>>
> > Filter: <<CONDITION>>
> > -> BitmapAnd
> > (cost=761.33..761.33 rows=7045 width=0)
> > -> Bitmap Index Scan
> on
> > par_7_rangestart (cost=0.00..380.35 rows=14942 width=0)
> > Index Cond:
> > (rangestart >= '2017-01-01 00:00:00+01'::timestamp with time zone)
> > -> BitmapOr
> > (cost=380.72..380.72 rows=12248 width=0)
> > -> Bitmap Index
> > Scan on par_7_datasource (cost=0.00..372.00 rows=12228 width=0)
> > Index
> Cond:
> > ((datasource)::text = 'one'::text)
> > -> Bitmap Index
> > Scan on par_7_datasource (cost=0.00..4.36 rows=10 width=0)
> > Index
> Cond:
> > ((datasource)::text = 'two'::text)
> > -> Bitmap Index
> > Scan on par_7_datasource (cost=0.00..4.36 rows=10 width=0)
> > Index
> Cond:
> > ((datasource)::text = 'three'::text)
> >
> > ....
> >
> >
> > In this particular query there were over _100_ partitions connected with
> the
> > UNION ALL operator.
>
> Ah, so you have many Gather nodes under Append? That's one way to eat
> arbitrarily many DSM slots. We allow for 64 + 2 * max_backends. Does
> it help if you increase max_connections? I am concerned about the
> crash failure mode you mentioned in the first email though: we should
> always be able to handle that condition gracefully.
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Munro 2017-11-27 22:48:54 Re: ERROR: too many dynamic shared memory segments
Previous Message Melvin Davidson 2017-11-27 19:29:32 Re: Setting a default for nextval sequence

Browse pgsql-hackers by date

  From Date Subject
Next Message Hadi Moshayedi 2017-11-27 21:12:25 Re: [PATCH] Fix crash in int8_avg_combine().
Previous Message Magnus Hagander 2017-11-27 21:05:06 Re: [HACKERS] More stats about skipped vacuums