Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed)

From: Patrick Verdon <patrick(at)kan(dot)co(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed)
Date: 1999-01-29 16:05:28
Message-ID: 36B1DC48.8C52FD92@kan.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Tatsuo, Vadim, Oleg, Scrappy,

Many thanks for the response.

A couple of you weren't convinced that this
is a Postgres problem so let me try to clear
the water a little bit. Maybe the use of
Apache and mod_perl is confusing the issue -
the point I was trying to make is that if
there are 49+ concurrent postgres processes
on a normal machine (i.e. where kernel
parameters are the defaults, etc.) the
postmaster dies in a nasty way with
potentially damaging results.

Here's a case without Apache/mod_perl that
causes exactly the same behaviour. Simply
enter the following 49 times:

kandinsky:patrick> psql template1 &

Note that I tried to automate this without
success:

perl -e 'for ( 1..49 ) { system("/usr/local/pgsql/bin/psql template1 &"); }'

The 49th attempt to initiate a connection
fails:

Connection to database 'template1' failed.
pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally before or while processing the request.

and the error_log says:

InitPostgres
IpcSemaphoreCreate: semget failed (No space left on device) key=5432017, num=16, permission=600
proc_exit(3) [#0]
shmem_exit(3) [#0]
exit(3)
/usr/local/pgsql/bin/postmaster: reaping dead processes...
/usr/local/pgsql/bin/postmaster: CleanupProc: pid 1521 exited with status 768
/usr/local/pgsql/bin/postmaster: CleanupProc: sending SIGUSR1 to process 1518
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.

FATAL: s_lock(dfebe065) at spin.c:125, stuck spinlock. Aborting.

FATAL: s_lock(dfebe065) at spin.c:125, stuck spinlock. Aborting.

Even if there is a hard limit there is no way that
Postgres should die in this spectacular fashion.
I wouldn't have said that it was unreasonable for
some large applications to peak at >48 processes
when using powerful hardware with plenty of RAM.

The other point is that even if one had 1 GB RAM,
Postgres won't scale beyond 48 processes, using
probably less than 100 MB of RAM. Would it be
possible to make the 'MaxBackendId' configurable
for those who have the resources?

I have reproduced this behaviour on both
FreeBSD 2.2.8 and Intel Solaris 2.6 using
version 6.4.x of PostgreSQL.

I'll try to change some of the parameters
suggested and see how far I get but the bottom
line is Postgres shouldn't be dying like this.

Let me know if you need any more info.

Cheers.

Patrick

--

#===============================#
\ KAN Design & Publishing Ltd /
/ T: +44 (0)1223 511134 \
\ F: +44 (0)1223 571968 /
/ E: mailto:patrick(at)kan(dot)co(dot)uk \
\ W: http://www.kan.co.uk /
#===============================#

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 1999-01-29 16:21:15 Re: [HACKERS] Postgres Speed or lack thereof
Previous Message Oleg Broytmann 1999-01-29 15:54:15 VACUUM ANALYZE failed on linux