Re: a new standby server promotes itself to primary without cause

From: John Scalia <jayknowsunix(at)gmail(dot)com>
To: Scott Mead <scottm(at)openscg(dot)com>
Cc: Keith Fiske <keith(at)omniti(dot)com>, "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: a new standby server promotes itself to primary without cause
Date: 2015-10-08 15:43:09
Message-ID: CABzCKRAaego4oo1L-e3Fz3obGBdHZaiZ_qRF0fG53D4ezbeR7Q@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-admin

That fixed it, thanks Scott. I wish there had been more detail about that
parameter on the postgresql website (maybe I could add it.) I commented out
that parameter and successfully restarted the server. This time it remained
as a standby.

Thanks again,
Jay

On Thu, Oct 8, 2015 at 10:04 AM, Scott Mead <scottm(at)openscg(dot)com> wrote:

> On Thu, Oct 8, 2015 at 10:59 AM, John Scalia <jayknowsunix(at)gmail(dot)com>
> wrote:
>
>> Yep, they're both in different directories and on different ports. Allow
>> me to note, that I have both handbuilt this file from the sample provided
>> in /usr/pgsql-9.4/share and used pg_basebackup's -R option to automatically
>> generate it. Doesn't seem to make a difference. Here's the recovery.conf
>> file I'm using on the standby.
>>
>> restore_command = 'cp /var/lib/pgsql/9.4/share/%f %p' # e.g.
>> 'cp /mnt/server/archivedir/%f %p'
>> archive_cleanup_command = '/usr/pgsql-9.4/bin/pg_archivecleanup
>> /var/lib/pgsql/9.4/share %r'
>> #recovery_end_command = ''
>> #recovery_target_name = '' # e.g. 'daily backup 2011-01-26'
>> #recovery_target_time = '' # e.g. '2004-07-14 22:39:00 EST'
>> #recovery_target_xid = ''
>> #recovery_target_inclusive = true
>> recovery_target = 'immediate'
>>
>
> I believe you should omit recovery_target. From
> http://www.postgresql.org/docs/9.4/static/recovery-target-settings.html
>
> recovery_target = 'immediate'
>
> This parameter specifies that recovery should end as soon as a consistent
> state is reached, i.e. as early as possible. When restoring from an online
> backup, this means the point where taking the backup ended.
>
>
> --
> Scott Mead
> Sr. Architect
> OpenSCG
> PostgreSQL, Java & Linux Experts
>
>
> http://openscg.com
>
>
> pause_at_recovery_target = false
>> standby_mode = on
>> primary_conninfo = 'host=10.32.10.175 port=5432 user=replicator'
>> # e.g. 'host=localhost port=5432'
>> primary_slot_name = 'replicant'
>> trigger_file = '/var/lib/pgsql/9.4/share/trigger_file'
>> #recovery_min_apply_delay = 0
>>
>> I have also gone through the master's postgresql.conf file several times
>> looking for problems, and I've verified the replication slot exists.
>>
>> Thanks for your assistance,
>> Jay
>>
>> On Thu, Oct 8, 2015 at 9:51 AM, Keith Fiske <keith(at)omniti(dot)com> wrote:
>>
>>>
>>>
>>>
>>>
>>> On Thu, Oct 8, 2015 at 9:24 AM, John Scalia <jayknowsunix(at)gmail(dot)com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> We have an ongoing discussion here about retiring slony and
>>>> implementing streaming synchronous replication. As part of this effort,
>>>> I've been trying to build a small cluster with one primary and one standby.
>>>> Yes, I know about the potential problems with a single standby, but this is
>>>> only to provide configuration parameters to my sysadmins.
>>>>
>>>> What I've tried is building the two servers on a single sandbox server.
>>>> The primary has been running correctly for some time. The standby starts,
>>>> but quickly turns itself into a primary without any trigger file being
>>>> present. My process is:
>>>>
>>>> 1) build the new standby in a different directory with pg_basebackup
>>>> 2) edit the standby's postgresql.conf so the standby has a different
>>>> port.
>>>> 3) start the standby.
>>>> 4) immediately after starting the standby, I see a recovery.done file
>>>> in that PGDATA dir.
>>>>
>>>> Here's part of the debug2 level log file from that standby, but I'm not
>>>> seeing why this standby is becoming a primary. Only that it is:
>>>>
>>>> [2015-10-08 07:18:19.448 CDT] @: LOG: 00000: database system is ready
>>>> to accept read only connections
>>>> [2015-10-08 07:18:19.448 CDT] @: LOCATION: sigusr1_handler,
>>>> postmaster.c:4857
>>>> cp: cannot stat `/var/lib/pgsql/9.4/share/00000002.history': No such
>>>> file or directory
>>>> [2015-10-08 07:18:19.477 CDT] @: DEBUG: 00000: could not restore file
>>>> "00000002.history" from archive: child process exited with exit code 1
>>>> [2015-10-08 07:18:19.477 CDT] @: LOCATION: RestoreArchivedFile,
>>>> xlogarchive.c:304
>>>> [2015-10-08 07:18:19.477 CDT] @: LOG: 00000: selected new timeline ID:
>>>> 2
>>>> [2015-10-08 07:18:19.477 CDT] @: LOCATION: StartupXLOG, xlog.c:7107
>>>> cp: cannot stat `/var/lib/pgsql/9.4/share/00000001.history': No such
>>>> file or directory
>>>> [2015-10-08 07:18:19.485 CDT] @: DEBUG: 00000: could not restore file
>>>> "00000001.history" from archive: child process exited with exit code 1
>>>> [2015-10-08 07:18:19.485 CDT] @: LOCATION: RestoreArchivedFile,
>>>> xlogarchive.c:304
>>>> [2015-10-08 07:18:19.792 CDT] @: LOG: 00000: archive recovery complete
>>>> [2015-10-08 07:18:19.792 CDT] @: LOCATION: exitArchiveRecovery,
>>>> xlog.c:5417
>>>> [2015-10-08 07:18:19.798 CDT] @: DEBUG: 00000: performing replication
>>>> slot checkpoint
>>>> [2015-10-08 07:18:19.798 CDT] @: LOCATION: CheckPointReplicationSlots,
>>>> slot.c:794
>>>> [2015-10-08 07:18:19.814 CDT] @: DEBUG: 00000: attempting to remove
>>>> WAL segments older than log file 0000000000000006000000D8
>>>> [2015-10-08 07:18:19.814 CDT] @: LOCATION: RemoveOldXlogFiles,
>>>> xlog.c:3775
>>>> [2015-10-08 07:18:19.814 CDT] @: DEBUG: 00000: SlruScanDirectory
>>>> invoking callback on pg_multixact/offsets/0000
>>>> [2015-10-08 07:18:19.814 CDT] @: LOCATION: SlruScanDirectory,
>>>> slru.c:1307
>>>> [2015-10-08 07:18:19.815 CDT] @: DEBUG: 00000: SlruScanDirectory
>>>> invoking callback on pg_multixact/members/0000
>>>> [2015-10-08 07:18:19.815 CDT] @: LOCATION: SlruScanDirectory,
>>>> slru.c:1307
>>>> [2015-10-08 07:18:19.815 CDT] @: DEBUG: 00000: SlruScanDirectory
>>>> invoking callback on pg_multixact/offsets/0000
>>>> [2015-10-08 07:18:19.815 CDT] @: LOCATION: SlruScanDirectory,
>>>> slru.c:1307
>>>> [2015-10-08 07:18:19.893 CDT] @: DEBUG: 00000: attempting to remove
>>>> WAL segments newer than log file 0000000200000006000000E1
>>>> [2015-10-08 07:18:19.893 CDT] @: LOCATION: RemoveNonParentXlogFiles,
>>>> xlog.c:5458
>>>> [2015-10-08 07:18:19.899 CDT] @: DEBUG: 00000: oldest MultiXactId
>>>> member is at offset 1
>>>> [2015-10-08 07:18:19.899 CDT] @: LOCATION: SetOffsetVacuumLimit,
>>>> multixact.c:2677
>>>> [2015-10-08 07:18:19.899 CDT] @: LOG: 00000: MultiXact member
>>>> wraparound protections are now enabled
>>>> [2015-10-08 07:18:19.899 CDT] @: LOCATION: DetermineSafeOldestOffset,
>>>> multixact.c:2587
>>>> [2015-10-08 07:18:19.899 CDT] @: DEBUG: 00000: MultiXact member stop
>>>> limit is now 4294914944 based on MultiXact 1
>>>> [2015-10-08 07:18:19.899 CDT] @: LOCATION: DetermineSafeOldestOffset,
>>>> multixact.c:2590
>>>> [2015-10-08 07:18:19.899 CDT] @: DEBUG: 00000: release all standby
>>>> locks
>>>> [2015-10-08 07:18:19.899 CDT] @: LOCATION: StandbyReleaseAllLocks,
>>>> standby.c:666
>>>> [2015-10-08 07:18:19.903 CDT] @: LOG: 00000: starting background
>>>> worker process "powa"
>>>> [2015-10-08 07:18:19.904 CDT] @: LOCATION: do_start_bgworker,
>>>> postmaster.c:5412
>>>> [2015-10-08 07:18:19.905 CDT] @: LOG: 00000: database system is ready
>>>> to accept connections
>>>>
>>>> The one thing that's a little weird on this configuration is that both
>>>> the primary and the standby are on the same system, and I've never done a
>>>> configuration this way before. Usually, these are on different systems, but
>>>> this was done just to test some configuration parameters. Can a standby and
>>>> a primary not exist together on the same system? In any event, there is no
>>>> trigger file as specified in the recovery.conf on this system. So, why is
>>>> promotion occuring?
>>>> --
>>>> Jay
>>>>
>>>
>>>
>>> Long as they're on different ports and using different data directories,
>>> you should be fine.
>>> Can you share the contents of your recovery.conf file?
>>>
>>> --
>>> Keith Fiske
>>> Database Administrator
>>> OmniTI Computer Consulting, Inc.
>>> http://www.keithf4.com
>>>
>>
>>
>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message hubert depesz lubaczewski 2015-10-08 17:32:06 Re: Doubts PostgreSQL
Previous Message Scott Mead 2015-10-08 15:04:00 Re: a new standby server promotes itself to primary without cause