Postmaster can't stop with pg_ctl

From: takuya koide <koide-txa(at)necst(dot)nec(dot)co(dot)jp>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Postmaster can't stop with pg_ctl
Date: 2007-04-25 08:28:02
Message-ID: 20070425171347.D1AB.KOIDE-TXA@necst.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

============================================================================
POSTGRESQL BUG REPORT
============================================================================

Your name : Takuya Koide
Your email address : koide-txa (at) necst (dot) nec (dot) co (dot) jp

Category : runtime: back-end:
Severity : serious

Summary: Postmaster can't stop with pg_ctl

System Configuration
--------------------
Operating System : Red Hat Enterprise Linux ES release 4 (Nahant Update 3)

PostgreSQL version : PostgreSQL 8.2.4 on i686-redhat-linux-gnu,
compiled by GCC gcc (GCC) 3.4.5 20051201 (Red Hat 3.4.5-2)

notice: I use following RPM packages.

$ rpm -qa|grep -i postgresql
postgresql-server-8.2.4-1PGDG
postgresql-plperl-8.2.4-1PGDG
postgresql-8.2.4-1PGDG
postgresql-contrib-8.2.4-1PGDG
postgresql-docs-8.2.4-1PGDG
postgresql-plpython-8.2.4-1PGDG
postgresql-test-8.2.4-1PGDG
postgresql-libs-8.2.4-1PGDG
postgresql-devel-8.2.4-1PGDG
postgresql-pltcl-8.2.4-1PGDG

Compiler used : gcc

Hardware:
---------
x86

Versions of other tools:
------------------------

--------------------------------------------------------------------------

Problem Description:
--------------------
I found that pg_ctl can't stop postmaster processes under some conditions.

If PostgreSQL's process is abnormal condition (stall), I would like to
stop PostgreSQL's process (and restart) with /etc/rc.d/init.d/postgresql
But I couldn't stop its process.

--------------------------------------------------------------------------

Test Case (reproduce procedures):
---------------------------------
I can reproduce with following steps.

1) confirm current status.
# ps axuw|grep -i postgres|grep -Ev 'grep|bash|su -'
postgres 3507 0.1 1.0 21352 2800 ? S 18:48 0:00 /usr/bin/postmaster
-p 5432 -D /var/lib/pgsql/data
postgres 3509 0.0 0.2 11132 568 ? S 18:48 0:00 postgres: logger process
postgres 3514 0.0 0.3 21352 844 ? S 18:48 0:00 postgres: writer process
postgres 3515 0.0 0.2 12132 564 ? S 18:48 0:00 postgres: stats
buffer process
postgres 3516 0.0 0.2 11364 748 ? S 18:48 0:00 postgres: stats
collector process

2) connect with psql command by postgres user
$ id
uid=26(postgres) gid=26(postgres) group=26(postgres)
context=user_u:system_r:unconfined_t
-bash-3.1$ psql template1
template1=#

3) re-confirm status
# ps axuw|grep -i postgres|grep -Ev 'grep|bash|su -'
postgres 3507 0.0 1.1 21352 2804 ? S 18:48 0:00
/usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data
postgres 3509 0.0 0.2 11132 568 ? S 18:48 0:00 postgres: logger process
postgres 3514 0.0 0.3 21352 852 ? S 18:48 0:00 postgres: writer process
postgres 3515 0.0 0.2 12132 564 ? S 18:48 0:00 postgres: stats buffer process
postgres 3516 0.0 0.3 11364 772 ? S 18:48 0:00 postgres: stats collector process
postgres 3618 0.0 0.6 8476 1752 pts/3 S+ 18:54 0:00 psql template1
postgres 3619 0.0 0.8 22012 2124 ? S 18:54 0:00 postgres:
postgres template1 [local] idle

4) send 'SIGSTOP' signal to postgres
# kill -SIGSTOP 3619
# ps axuw|grep -i postgres|grep -Ev 'grep|bash|su -'
postgres 3507 0.0 1.1 21352 2804 ? S 18:48 0:00
/usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data
postgres 3509 0.0 0.2 11132 568 ? S 18:48 0:00 postgres: logger process
postgres 3514 0.0 0.3 21352 852 ? S 18:48 0:00 postgres: writer process
postgres 3515 0.0 0.2 12132 564 ? S 18:48 0:00 postgres: stats buffer process
postgres 3516 0.0 0.3 11364 772 ? S 18:48 0:00 postgres: stats collector process
postgres 3618 0.0 0.6 8476 1752 pts/3 S+ 18:54 0:00 psql template1
postgres 3619 0.0 0.8 22012 2124 ? T 18:54 0:00 postgres:
postgres template1 [local] idle

5) try to stop PostgreSQL with normal method
# /etc/rc.d/init.d/postgresql stop
postgresql stopping service: [fail]

6) confirm status and confirm that PostgreSQL is not stop. (this is
problem)
# ps axuw|grep -i postgres|grep -Ev 'grep|bash|su -'
postgres 3507 0.0 1.1 21352 2816 ? S 18:48 0:00
/usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data
postgres 3509 0.0 0.2 11132 568 ? S 18:48 0:00 postgres: logger process
postgres 3514 0.0 0.3 21352 852 ? S 18:48 0:00 postgres: writer process
postgres 3515 0.0 0.2 12132 564 ? S 18:48 0:00 postgres: stats buffer process
postgres 3516 0.0 0.3 11364 772 ? S 18:48 0:00 postgres: stats collector process
postgres 3618 0.0 0.6 8476 1752 pts/3 S+ 18:54 0:00 psql template1
postgres 3619 0.0 0.8 22012 2124 ? T 18:54 0:00 postgres:
postgres template1 [local] idle

7) try to stop PostgreSQL with SIGINT signal.
# kill -SIGINT 3507

8) confirm status and confirm that PostgreSQL is not stop. (this is
problem,too.)
# ps axuw|grep -i postgres|grep -Ev 'grep|bash|su -'
postgres 3507 0.0 1.1 21352 2816 ? S 18:48 0:00 /usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data
postgres 3509 0.0 0.2 11132 568 ? S 18:48 0:00 postgres: logger process
postgres 3514 0.0 0.3 21352 852 ? S 18:48 0:00 postgres: writer process
postgres 3515 0.0 0.2 12132 564 ? S 18:48 0:00 postgres: stats buffer process
postgres 3516 0.0 0.3 11364 772 ? S 18:48 0:00 postgres: stats collector process
postgres 3618 0.0 0.6 8476 1752 pts/3 S+ 18:54 0:00 psql template1
postgres 3619 0.0 0.8 22012 2124 ? T 18:54 0:00 postgres:
postgres template1 [local] idle

9) try to stop PostgreSQL with SIGKILL
# kill -SIGKILL 3507
# ps axuw|grep -i postgres|grep -Ev 'grep|bash|su -'
postgres 3509 0.0 0.2 11132 564 ? S 18:48 0:00 postgres: logger process
postgres 3618 0.0 0.5 8476 1520 pts/3 S+ 18:54 0:00 psql template1
postgres 3619 0.0 0.7 22012 1976 ? T 18:54 0:00 postgres:
postgres template1 [local] idle
# kill -SIGKILL 3509
# ps axuw|grep -i postgres|grep -Ev 'grep|bash|su -'
postgres 3618 0.0 0.5 8476 1520 pts/3 S+ 18:54 0:00 psql template1
postgres 3619 0.0 0.7 22012 1976 ? T 18:54 0:00 postgres:
postgres template1 [local] idle
# kill -SIGKILL 3619
# ps axuw|grep -i postgres|grep -Ev 'grep|bash|su -'
postgres 3618 0.0 0.5 8476 1520 pts/3 S+ 18:54 0:00 psql template1
# kill -SIGKILL 3618
# ps axuw|grep -i postgres|grep -Ev 'grep|bash|su -'
#

--------------------------------------------------------------------------

Solution:
---------
I suggest the method to resolve this issue.
If you think that this idea is good, please use it.

[current status]
a part of /etc/rc.d/init.d/postgresql (rc scripts of postgresql)
-------------------------------------------------------------------
stop(){
echo -n $"Stopping ${NAME} service: "
$SU -l postgres -c "$PGENGINE/pg_ctl stop -D '$PGDATA' -s -m fast" > /dev/null 2>&1 < /dev/null
ret=$?
-------------------------------------------------------------------

In postgresql processes is stalled

1. perform '/etc/rc.d/init.d/postgresql stop'
2. So pg_ctl of line3 is running and return error code (1)
(Please refer to following)

$ pg_ctl stop -m fast
waiting for postmaster to shut down........ failed
pg_ctl: postmaster does not shut down
$ echo $?
1

So when pg_ctl is fail, add performed script to rc script of postgresql.
(following 'add script')

/etc/rc.d/init.d/postgresql (rc scripts of postgresql)
-------------------------------------------------------------------
stop(){
echo -n $"Stopping ${NAME} service: "
...snip...
$SU -l postgres -c "$PGENGINE/pg_ctl stop -D '$PGDATA' -s -m fast" > /dev/null 2>&1 < /dev/null
ret=$?

# when pg_ctl fails, perform following steps
if [ ret value is 1 ]

# try to stop postgresql with pg_ctl
until $SU -l postgres -c "$PGENGINE/pg_ctl stop -D '$PGDATA' -s -m fast" > /dev/null 2>&1 < /dev/null; do

# if pg_ctl needs to times for stopping postgresql
sleep (user's designated time)

# if pg_ctl can not stop postgresql after repeat a few times
if [ loop's times equal user's designated time. ]

# give up using pg_ctl
exit loop
fi
done

# forced terminate postgresql
if [ user hope forced terminate postgresql ]
1. send SIGCONT signal to suspended processes of postgresql.
2. if postgresql can't stop, send SIGKILL signal to processes of postgresql.
3. release shared memory used by postgresql.
(if linux, can use ipcclean)
fi
fi
-------------------------------------------------------------------

--------------------------------------------------------------------------

---
Takuya Koide
NEC System Technologies, Ltd.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Christian Gonzalez 2007-04-25 12:57:17 RV: BUG #3236: Partitioning has problem with timestamp and timestamptz data type
Previous Message Magnus Hagander 2007-04-25 05:50:55 Re: BUG #3253: 8.2.4 Installer Displays No Default Install Path and Browse Error 2864