From: | Keith Fiske <keith(at)omniti(dot)com> |
---|---|
To: | John Scalia <jayknowsunix(at)gmail(dot)com> |
Cc: | "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org> |
Subject: | Re: a new standby server promotes itself to primary without cause |
Date: | 2015-10-08 14:51:45 |
Message-ID: | CAG1_KcAXoNRRk50RQpHwO77WSFWJfqrfNbs116ji1Xw4UhNJvA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
On Thu, Oct 8, 2015 at 9:24 AM, John Scalia <jayknowsunix(at)gmail(dot)com> wrote:
> Hi all,
>
> We have an ongoing discussion here about retiring slony and implementing
> streaming synchronous replication. As part of this effort, I've been trying
> to build a small cluster with one primary and one standby. Yes, I know
> about the potential problems with a single standby, but this is only to
> provide configuration parameters to my sysadmins.
>
> What I've tried is building the two servers on a single sandbox server.
> The primary has been running correctly for some time. The standby starts,
> but quickly turns itself into a primary without any trigger file being
> present. My process is:
>
> 1) build the new standby in a different directory with pg_basebackup
> 2) edit the standby's postgresql.conf so the standby has a different port.
> 3) start the standby.
> 4) immediately after starting the standby, I see a recovery.done file in
> that PGDATA dir.
>
> Here's part of the debug2 level log file from that standby, but I'm not
> seeing why this standby is becoming a primary. Only that it is:
>
> [2015-10-08 07:18:19.448 CDT] @: LOG: 00000: database system is ready to
> accept read only connections
> [2015-10-08 07:18:19.448 CDT] @: LOCATION: sigusr1_handler,
> postmaster.c:4857
> cp: cannot stat `/var/lib/pgsql/9.4/share/00000002.history': No such file
> or directory
> [2015-10-08 07:18:19.477 CDT] @: DEBUG: 00000: could not restore file
> "00000002.history" from archive: child process exited with exit code 1
> [2015-10-08 07:18:19.477 CDT] @: LOCATION: RestoreArchivedFile,
> xlogarchive.c:304
> [2015-10-08 07:18:19.477 CDT] @: LOG: 00000: selected new timeline ID: 2
> [2015-10-08 07:18:19.477 CDT] @: LOCATION: StartupXLOG, xlog.c:7107
> cp: cannot stat `/var/lib/pgsql/9.4/share/00000001.history': No such file
> or directory
> [2015-10-08 07:18:19.485 CDT] @: DEBUG: 00000: could not restore file
> "00000001.history" from archive: child process exited with exit code 1
> [2015-10-08 07:18:19.485 CDT] @: LOCATION: RestoreArchivedFile,
> xlogarchive.c:304
> [2015-10-08 07:18:19.792 CDT] @: LOG: 00000: archive recovery complete
> [2015-10-08 07:18:19.792 CDT] @: LOCATION: exitArchiveRecovery,
> xlog.c:5417
> [2015-10-08 07:18:19.798 CDT] @: DEBUG: 00000: performing replication
> slot checkpoint
> [2015-10-08 07:18:19.798 CDT] @: LOCATION: CheckPointReplicationSlots,
> slot.c:794
> [2015-10-08 07:18:19.814 CDT] @: DEBUG: 00000: attempting to remove WAL
> segments older than log file 0000000000000006000000D8
> [2015-10-08 07:18:19.814 CDT] @: LOCATION: RemoveOldXlogFiles, xlog.c:3775
> [2015-10-08 07:18:19.814 CDT] @: DEBUG: 00000: SlruScanDirectory invoking
> callback on pg_multixact/offsets/0000
> [2015-10-08 07:18:19.814 CDT] @: LOCATION: SlruScanDirectory, slru.c:1307
> [2015-10-08 07:18:19.815 CDT] @: DEBUG: 00000: SlruScanDirectory invoking
> callback on pg_multixact/members/0000
> [2015-10-08 07:18:19.815 CDT] @: LOCATION: SlruScanDirectory, slru.c:1307
> [2015-10-08 07:18:19.815 CDT] @: DEBUG: 00000: SlruScanDirectory invoking
> callback on pg_multixact/offsets/0000
> [2015-10-08 07:18:19.815 CDT] @: LOCATION: SlruScanDirectory, slru.c:1307
> [2015-10-08 07:18:19.893 CDT] @: DEBUG: 00000: attempting to remove WAL
> segments newer than log file 0000000200000006000000E1
> [2015-10-08 07:18:19.893 CDT] @: LOCATION: RemoveNonParentXlogFiles,
> xlog.c:5458
> [2015-10-08 07:18:19.899 CDT] @: DEBUG: 00000: oldest MultiXactId member
> is at offset 1
> [2015-10-08 07:18:19.899 CDT] @: LOCATION: SetOffsetVacuumLimit,
> multixact.c:2677
> [2015-10-08 07:18:19.899 CDT] @: LOG: 00000: MultiXact member wraparound
> protections are now enabled
> [2015-10-08 07:18:19.899 CDT] @: LOCATION: DetermineSafeOldestOffset,
> multixact.c:2587
> [2015-10-08 07:18:19.899 CDT] @: DEBUG: 00000: MultiXact member stop
> limit is now 4294914944 based on MultiXact 1
> [2015-10-08 07:18:19.899 CDT] @: LOCATION: DetermineSafeOldestOffset,
> multixact.c:2590
> [2015-10-08 07:18:19.899 CDT] @: DEBUG: 00000: release all standby locks
> [2015-10-08 07:18:19.899 CDT] @: LOCATION: StandbyReleaseAllLocks,
> standby.c:666
> [2015-10-08 07:18:19.903 CDT] @: LOG: 00000: starting background worker
> process "powa"
> [2015-10-08 07:18:19.904 CDT] @: LOCATION: do_start_bgworker,
> postmaster.c:5412
> [2015-10-08 07:18:19.905 CDT] @: LOG: 00000: database system is ready to
> accept connections
>
> The one thing that's a little weird on this configuration is that both the
> primary and the standby are on the same system, and I've never done a
> configuration this way before. Usually, these are on different systems, but
> this was done just to test some configuration parameters. Can a standby and
> a primary not exist together on the same system? In any event, there is no
> trigger file as specified in the recovery.conf on this system. So, why is
> promotion occuring?
> --
> Jay
>
Long as they're on different ports and using different data directories,
you should be fine.
Can you share the contents of your recovery.conf file?
--
Keith Fiske
Database Administrator
OmniTI Computer Consulting, Inc.
http://www.keithf4.com
From | Date | Subject | |
---|---|---|---|
Next Message | Scott Mead | 2015-10-08 14:54:36 | Re: a new standby server promotes itself to primary without cause |
Previous Message | rafael.burischipfer | 2015-10-08 14:35:32 | Doubts PostgreSQL |