pg_ctl restart - behaviour based on wrong instance

From: "Erik Rijkers" <er(at)xs4all(dot)nl>
To: pgsql-hackers(at)postgresql(dot)org
Subject: pg_ctl restart - behaviour based on wrong instance
Date: 2011-03-18 17:19:49
Message-ID: 4d9c910aa4a6303c4592098b1bf997e5.squirrel@webmail.xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I am not sure the following pg_ctl behaviour is really a bug, but I find it unexpected enough to
report.

I was testing synchronous replication in a test setup on a single machine. (After all, one could
have different instances on different arrays, right? If you think this is an unlikely use-case,
perhaps the following is not important.)

There are two installations of 9.1devel (git as of today):
primary: /var/data1/pg_stuff/pg_installations/pgsql.vanilla_1
standby: /var/data1/pg_stuff/pg_installations/pgsql.vanilla_2

The standby's data_directory is generated by pg_basebackup from vanilla_1.

The problem is the very first run of pg_ctl restart:

pg_ctl first correctly decides that the standby instance (=vanilla_2) isn't yet running:

pg_ctl: PID file "/var/data1/pg_stuff/pg_installations/pgsql.vanilla_2/data/postmaster.pid" does
not exist

This is OK and expected. But then it continues (in the logfile) with:

FATAL: lock file "postmaster.pid" already exists
HINT: Is another postmaster (PID 20519) running in data directory
"/var/data1/pg_stuff/pg_installations/pgsql.vanilla_1/data"?

So, complaints about the *other* instance. It doesn't happen once a successful start (with pg_ctl
start) has happened.

It starts fine when started right away with 'start' instead of 'restart'.

Also, if it has been started once, it will react to 'pg_ctl restart' without the errors.

I'll attach a shell-script, that provokes the error, see the 'restart' on the line with the
comment: 'HERE'

It would seem (see below) that pg_ctl's final decision about the standby, (that is has started up)
is wrong; the standby does *not* eventually start.

Below the output of the attached shell script. (careful - it deletes stuff)
(It still contains some debug lines, but I didn't want to change it too much.)

$ clear; ./split_vanilla.sh

PGPASSFILE=/home/rijkers/.pg_rijkers
waiting for server to shut down.... done
server stopped
waiting for server to shut down.... done
server stopped
waiting for server to start.... done
server started
removed `/var/data1/pg_stuff/archive_dir/000000010000000000000018'
removed `/var/data1/pg_stuff/archive_dir/000000010000000000000019'
removed `/var/data1/pg_stuff/archive_dir/000000010000000000000019.00000020.backup'
removed `/var/data1/pg_stuff/archive_dir/00000001000000000000001A'
/var/data1/pg_stuff/pg_installations/pgsql.vanilla_1/bin/pg_basebackup
NOTICE: pg_stop_backup complete, all required WAL segments have been archived

BINDIR = /var/data1/pg_stuff/pg_installations/pgsql.vanilla_1/bin
PGPORT=6564
PGPASSFILE=/home/rijkers/.pg_rijkers
PGDATA=/var/data1/pg_stuff/pg_installations/pgsql.vanilla_1/data
/var/data1/pg_stuff/pg_installations/pgsql.vanilla_1/bin/pg_ctl
waiting for server to shut down.... done
server stopped
waiting for server to start.... done
server started
UID PID PPID C STIME TTY STAT TIME CMD
rijkers 20519 1 20 17:19 pts/25 S+ 0:00
/var/data1/pg_stuff/pg_installations/pgsql.vanilla_1/bin/postgres -D
/var/data1/pg_stuff/pg_installations/pgsql.vanilla_1/data
rijkers 20521 20519 0 17:19 ? Ss 0:00 \_ postgres: writer process
rijkers 20522 20519 0 17:19 ? Ss 0:00 \_ postgres: wal writer process
rijkers 20523 20519 0 17:19 ? Ss 0:00 \_ postgres: autovacuum launcher process
rijkers 20524 20519 0 17:19 ? Ss 0:00 \_ postgres: archiver process
rijkers 20525 20519 0 17:19 ? Ss 0:00 \_ postgres: stats collector process

BINDIR = /var/data1/pg_stuff/pg_installations/pgsql.vanilla_2/bin
PGPORT=6664
PGPASSFILE=/home/rijkers/.pg_rijkers
PGDATA=/var/data1/pg_stuff/pg_installations/pgsql.vanilla_2/data
/var/data1/pg_stuff/pg_installations/pgsql.vanilla_2/bin/pg_ctl
pg_ctl: PID file "/var/data1/pg_stuff/pg_installations/pgsql.vanilla_2/data/postmaster.pid" does
not exist
Is server running?
starting server anyway
waiting for server to start............................................................... done
server started

-- logfile 1:
LOG: database system is shut down
LOG: database system was shut down at 2011-03-18 17:19:54 CET
LOG: autovacuum launcher started
LOG: database system is ready to accept connections

-- logfile 2:
LOG: shutting down
LOG: database system is shut down
FATAL: lock file "postmaster.pid" already exists
HINT: Is another postmaster (PID 20519) running in data directory
"/var/data1/pg_stuff/pg_installations/pgsql.vanilla_1/data"?

thanks,

Erik Rijkers

Attachment Content-Type Size
split_vanilla.sh application/octet-stream 3.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-03-18 17:25:16 Re: Sync Rep and shutdown Re: Sync Rep v19
Previous Message Josh Berkus 2011-03-18 17:17:37 Re: 2nd Level Buffer Cache