Skip site navigation (1) Skip section navigation (2)

backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)

From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>,<pgsql-hackers(at)postgreSQL(dot)org>
Subject: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)
Date: 2013-01-30 14:11:04
Message-ID: 20DAEA8949EC4E2289C6E8E58560DEC0@maumau (view raw or flat)
Thread:
Lists: pgsql-hackers
From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Since we've fixed a couple of relatively nasty bugs recently, the core
> committee has determined that it'd be a good idea to push out PG update
> releases soon.  The current plan is to wrap on Monday Feb 4 for public
> announcement Thursday Feb 7.  If you're aware of any bug fixes you think
> ought to get included, now's the time to get them done ...

I've just encountered another serious bug, which I wish to be fixed in the 
upcoming minor release.

I'm using streaming replication with PostgreSQL 9.1.6 on Linux (RHEL6.2, 
kernel 2.6.32).  But this problem should happen regardless of the use of 
streaming replication.

When I ran "pg_ctl stop -mi" against the primary, some applications 
connected to the primary did not stop.  The cause was that the backends was 
deadlocked in quickdie() with some call stack like the following.  I'm sorry 
to have left the stack trace file on the testing machine, so I'll show you 
the precise stack trace tomorrow.

some lock function
malloc()
gettext()
errhint()
quickdie()
<signal handler called because of SIGQUIT>
free()
...
PostgresMain()
...

The root cause is that gettext() is called in the signal handler quickdie() 
via errhint().  As you know, malloc() cannot be called in a signal handler:

http://www.gnu.org/software/libc/manual/html_node/Nonreentrancy.html#Nonreentrancy

[Excerpt]
On most systems, malloc and free are not reentrant, because they use a 
static data structure which records what memory blocks are free. As a 
result, no library functions that allocate or free memory are reentrant. 
This includes functions that allocate space to store a result.


And gettext() calls malloc(), as reported below:

http://lists.gnu.org/archive/html/bug-coreutils/2005-04/msg00056.html

I think the solution is the typical one.  That is, to just remember the 
receipt of SIGQUIT by setting a global variable and call siglongjmp() in 
quickdie(), and perform tasks currently done in quickdie() when sigsetjmp() 
returns in PostgresMain().

What do think about the solution?  Could you include the fix?  If it's okay 
and you want, I'll submit the patch.

Regards
MauMau



In response to

Responses

pgsql-hackers by date

Next:From: Zoltán BöszörményiDate: 2013-01-30 14:29:04
Subject: Re: Strange Windows problem, lock_timeout test request
Previous:From: Andres FreundDate: 2013-01-30 13:58:24
Subject: Re: autovacuum not prioritising for-wraparound tables

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group