Skip site navigation (1) Skip section navigation (2)

Streaming Replication Failover

From: ning chan <ninchan8328(at)gmail(dot)com>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Streaming Replication Failover
Date: 2013-01-17 05:17:30
Message-ID: CAG0k5vDu=qkKBWWa=jiSDxhXk6jww3-vPKHLQYq=aTzq9NcF8w@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-general
Hi,
I have a cluster of 3 nodes Primary is connected by StandbyA (streaming),
Standby A is connected by Standby B (streaming).
I failed over the cluster
1) stop primary
2) promoted StandbyA

Now i see from syslog on Standby B that it is complaining about the
timeline mismatch.

Replication Status from Primary
=============================================
|Parameters           |        Value        |
=============================================
|backend_start        | 2013-01-16 23:05:48 |
|pid                  |        17851        |
|usesysid             |          10         |
|usename              |       postgres      |
|application_name     |       StandbyA      |
|client_addr          |     10.89.94.31     |
|client_hostname      |                     |
|client_port          |        43558        |
|state                |      streaming      |
|sent_location        |      0/1EAC3E68     |
|write_location       |      0/1EAC3E68     |
|flush_location       |      0/1EAC3E68     |
|replay_location      |      0/1EAC3E68     |
|sync_priority        |          0          |
|sync_state           |        async        |
=============================================

Replication Status from Standby A
=============================================
|Parameters           |        Value        |
=============================================
|backend_start        | 2013-01-16 23:06:56 |
|pid                  |        12320        |
|usesysid             |          10         |
|usename              |       postgres      |
|application_name     |       StandByB      |
|client_addr          |     10.89.94.29     |
|client_hostname      |                     |
|client_port          |        48214        |
|state                |      streaming      |
|sent_location        |      0/1EAC3E68     |
|write_location       |      0/1EAC3E68     |
|flush_location       |      0/1EAC3E68     |
|replay_location      |      0/1EAC3E68     |
|sync_priority        |          0          |
|sync_state           |        async        |
=============================================

now fail over Primary
On StandByA syslog,
Jan 16 23:08:12 se032c-94-31 postgres[12316]: [3-1] 12316FATAL:
replication terminated by primary server
Jan 16 23:08:12 se032c-94-31 postgres[12312]: [5-1] 12312LOG:  redo starts
at 0/1EAC3E68

On StandByB syslog
Jan 16 23:09:48 localhost postgres[3932]: [5-1] LOG:  redo starts at
0/1EAC3E68

Now as soon as I promoted the StandByA,
i see replication between A & B is broken, from StandBy B syslog, it shows
the following.
Jan 16 23:11:28 localhost postgres[3945]: [2-1] FATAL:  timeline 15 of the
primary does not match recovery target timeline 14

Now my question is while A & B are in sync, why promoting B will break the
replication.

To resolve the problem, I need to do stop the engine on B, rsync from A,
and start back the B engine.
rsync -a --progress --exclude postgresql.conf --exclude recovery.done
--exclude pg_hba.conf root(at)10(dot)89(dot)94(dot)31:/opt/postgres/9.2/data/*
/opt/postgres/9.2/data

Do I need to sync the whole data directory from A? I have a small DB now (2
tables with only few rows). This may take a long time if I have a much
larger DB. Any shortcut? Why do i need to do the rync while A & B are
originally in sync?

Thanks~
Ning

pgsql-general by date

Next:From: Stuart BishopDate: 2013-01-17 08:18:09
Subject: Re: plpython intermittent ImportErrors
Previous:From: Kirk WythersDate: 2013-01-17 05:15:56
Subject: speeding up a join query that utilizes a view

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group