backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)

From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgreSQL(dot)org>
Subject: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)
Date: 2013-01-30 14:11:04
Message-ID: 20DAEA8949EC4E2289C6E8E58560DEC0@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Since we've fixed a couple of relatively nasty bugs recently, the core
> committee has determined that it'd be a good idea to push out PG update
> releases soon. The current plan is to wrap on Monday Feb 4 for public
> announcement Thursday Feb 7. If you're aware of any bug fixes you think
> ought to get included, now's the time to get them done ...

I've just encountered another serious bug, which I wish to be fixed in the
upcoming minor release.

I'm using streaming replication with PostgreSQL 9.1.6 on Linux (RHEL6.2,
kernel 2.6.32). But this problem should happen regardless of the use of
streaming replication.

When I ran "pg_ctl stop -mi" against the primary, some applications
connected to the primary did not stop. The cause was that the backends was
deadlocked in quickdie() with some call stack like the following. I'm sorry
to have left the stack trace file on the testing machine, so I'll show you
the precise stack trace tomorrow.

some lock function
malloc()
gettext()
errhint()
quickdie()
<signal handler called because of SIGQUIT>
free()
...
PostgresMain()
...

The root cause is that gettext() is called in the signal handler quickdie()
via errhint(). As you know, malloc() cannot be called in a signal handler:

http://www.gnu.org/software/libc/manual/html_node/Nonreentrancy.html#Nonreentrancy

[Excerpt]
On most systems, malloc and free are not reentrant, because they use a
static data structure which records what memory blocks are free. As a
result, no library functions that allocate or free memory are reentrant.
This includes functions that allocate space to store a result.

And gettext() calls malloc(), as reported below:

http://lists.gnu.org/archive/html/bug-coreutils/2005-04/msg00056.html

I think the solution is the typical one. That is, to just remember the
receipt of SIGQUIT by setting a global variable and call siglongjmp() in
quickdie(), and perform tasks currently done in quickdie() when sigsetjmp()
returns in PostgresMain().

What do think about the solution? Could you include the fix? If it's okay
and you want, I'll submit the patch.

Regards
MauMau

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zoltán Böszörményi 2013-01-30 14:29:04 Re: Strange Windows problem, lock_timeout test request
Previous Message Andres Freund 2013-01-30 13:58:24 Re: autovacuum not prioritising for-wraparound tables