Skip site navigation (1) Skip section navigation (2)

pg_ctl restart - behaviour based on wrong instance

From: "Erik Rijkers" <er(at)xs4all(dot)nl>
To: pgsql-hackers(at)postgresql(dot)org
Subject: pg_ctl restart - behaviour based on wrong instance
Date: 2011-03-18 17:19:49
Message-ID: 4d9c910aa4a6303c4592098b1bf997e5.squirrel@webmail.xs4all.nl (view raw or flat)
Thread:
Lists: pgsql-hackers
I am not sure the following pg_ctl behaviour is really a bug, but I find it unexpected enough to
report.

I was testing synchronous replication in a test setup on a single machine.  (After all, one could
have different instances on different arrays, right?  If you think this is an unlikely use-case,
perhaps the following is not important.)

There are two installations of 9.1devel (git as of today):
  primary: /var/data1/pg_stuff/pg_installations/pgsql.vanilla_1
  standby: /var/data1/pg_stuff/pg_installations/pgsql.vanilla_2

The standby's data_directory is generated by pg_basebackup from vanilla_1.

The problem is the very first run of  pg_ctl restart:

pg_ctl first correctly decides that the standby instance (=vanilla_2) isn't yet running:

pg_ctl: PID file "/var/data1/pg_stuff/pg_installations/pgsql.vanilla_2/data/postmaster.pid" does
not exist

This is OK and expected.  But then it continues (in the logfile) with:

FATAL:  lock file "postmaster.pid" already exists
HINT:  Is another postmaster (PID 20519) running in data directory
"/var/data1/pg_stuff/pg_installations/pgsql.vanilla_1/data"?

So, complaints about the *other* instance.  It doesn't happen once a successful start (with pg_ctl
start) has happened.

It starts fine when started right away with 'start' instead of  'restart'.

Also, if it has been started once, it will react to 'pg_ctl restart' without the errors.

I'll attach a shell-script, that provokes the error, see the 'restart' on the line with the
comment: 'HERE'

It would seem (see below) that pg_ctl's final decision about the standby, (that is has started up)
is wrong; the standby does *not* eventually start.


Below the output of the attached shell script.  (careful - it deletes stuff)
(It still contains some debug lines, but I didn't want to change it too much.)


$ clear; ./split_vanilla.sh

PGPASSFILE=/home/rijkers/.pg_rijkers
waiting for server to shut down.... done
server stopped
waiting for server to shut down.... done
server stopped
waiting for server to start.... done
server started
removed `/var/data1/pg_stuff/archive_dir/000000010000000000000018'
removed `/var/data1/pg_stuff/archive_dir/000000010000000000000019'
removed `/var/data1/pg_stuff/archive_dir/000000010000000000000019.00000020.backup'
removed `/var/data1/pg_stuff/archive_dir/00000001000000000000001A'
/var/data1/pg_stuff/pg_installations/pgsql.vanilla_1/bin/pg_basebackup
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived

BINDIR = /var/data1/pg_stuff/pg_installations/pgsql.vanilla_1/bin
PGPORT=6564
PGPASSFILE=/home/rijkers/.pg_rijkers
PGDATA=/var/data1/pg_stuff/pg_installations/pgsql.vanilla_1/data
/var/data1/pg_stuff/pg_installations/pgsql.vanilla_1/bin/pg_ctl
waiting for server to shut down.... done
server stopped
waiting for server to start.... done
server started
UID        PID  PPID  C STIME TTY      STAT   TIME CMD
rijkers  20519     1 20 17:19 pts/25   S+     0:00
/var/data1/pg_stuff/pg_installations/pgsql.vanilla_1/bin/postgres -D
/var/data1/pg_stuff/pg_installations/pgsql.vanilla_1/data
rijkers  20521 20519  0 17:19 ?        Ss     0:00  \_ postgres: writer process
rijkers  20522 20519  0 17:19 ?        Ss     0:00  \_ postgres: wal writer process
rijkers  20523 20519  0 17:19 ?        Ss     0:00  \_ postgres: autovacuum launcher process
rijkers  20524 20519  0 17:19 ?        Ss     0:00  \_ postgres: archiver process
rijkers  20525 20519  0 17:19 ?        Ss     0:00  \_ postgres: stats collector process

BINDIR = /var/data1/pg_stuff/pg_installations/pgsql.vanilla_2/bin
PGPORT=6664
PGPASSFILE=/home/rijkers/.pg_rijkers
PGDATA=/var/data1/pg_stuff/pg_installations/pgsql.vanilla_2/data
/var/data1/pg_stuff/pg_installations/pgsql.vanilla_2/bin/pg_ctl
pg_ctl: PID file "/var/data1/pg_stuff/pg_installations/pgsql.vanilla_2/data/postmaster.pid" does
not exist
Is server running?
starting server anyway
waiting for server to start............................................................... done
server started

-- logfile 1:
LOG:  database system is shut down
LOG:  database system was shut down at 2011-03-18 17:19:54 CET
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections

-- logfile 2:
LOG:  shutting down
LOG:  database system is shut down
FATAL:  lock file "postmaster.pid" already exists
HINT:  Is another postmaster (PID 20519) running in data directory
"/var/data1/pg_stuff/pg_installations/pgsql.vanilla_1/data"?




thanks,

Erik Rijkers

Attachment: split_vanilla.sh
Description: application/octet-stream (3.5 KB)

Responses

pgsql-hackers by date

Next:From: Robert HaasDate: 2011-03-18 17:25:16
Subject: Re: Sync Rep and shutdown Re: Sync Rep v19
Previous:From: Josh BerkusDate: 2011-03-18 17:17:37
Subject: Re: 2nd Level Buffer Cache

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group