Re: Check that streaming replica received all data after master shutdown

From: Sameer Kumar <sameer(dot)kumar(at)ashnik(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Vladimir Borodin <root(at)simply(dot)name>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, PostgreSQL General Discussion Forum <pgsql-general(at)postgresql(dot)org>
Subject: Re: Check that streaming replica received all data after master shutdown
Date: 2015-01-15 08:43:10
Message-ID: CADp-Sm7LWp-9Ngk5BHcfR1WSwnTtCNmdFkqROA0pD50fjGF5kw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Wed, Jan 14, 2015 at 2:11 AM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> wrote:

> On 01/13/2015 12:11 PM, Vladimir Borodin wrote:
>
>>
>> 05 янв. 2015 г., в 18:15, Vladimir Borodin <root(at)simply(dot)name> написал(а):
>>
>> Hi all.
>>>
>>> I have a simple script for planned switchover of PostgreSQL (9.3 and
>>> 9.4) master to one of its replicas. This script checks a lot of things
>>> before doing it and one of them is that all data from master has been
>>> received by replica that is going to be promoted. Right now the check is
>>> done like below:
>>>
>>> On the master:
>>>
>>> postgres(at)pgtest03d ~ $ psql -t -A -c 'select
>>> pg_current_xlog_location();'
>>> 0/33000090
>>> postgres(at)pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_ctl stop -m fast
>>> waiting for server to shut down.... done
>>> server stopped
>>> postgres(at)pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_controldata | head
>>> pg_control version number: 937
>>> Catalog version number: 201306121
>>> Database system identifier: 6061800518091528182
>>> Database cluster state: shut down
>>> pg_control last modified: Mon 05 Jan 2015 06:47:57 PM MSK
>>> Latest checkpoint location: 0/34000028
>>> Prior checkpoint location: 0/33000028
>>> Latest checkpoint's REDO location: 0/34000028
>>> Latest checkpoint's REDO WAL file: 0000001B0000000000000034
>>> Latest checkpoint's TimeLineID: 27
>>> postgres(at)pgtest03d ~ $
>>>
>>> On the replica (after shutdown of master):
>>>
>>> postgres(at)pgtest03g ~ $ psql -t -A -c "select
>>> pg_xlog_location_diff(pg_last_xlog_replay_location(), '0/34000028');"
>>> 104
>>> postgres(at)pgtest03g ~ $
>>>
>>> These 104 bytes seems to be the size of shutdown checkpoint record (as I
>>> can understand from pg_xlogdump output).
>>>
>>> postgres(at)pgtest03g ~/9.3/data/pg_xlog $ /usr/pgsql-9.3/bin/pg_xlogdump
>>> -s 0/33000090 -t 27
>>> rmgr: XLOG len (rec/tot): 0/ 32, tx: 0, lsn:
>>> 0/33000090, prev 0/33000028, bkp: 0000, desc: xlog switch
>>> rmgr: XLOG len (rec/tot): 72/ 104, tx: 0, lsn:
>>> 0/34000028, prev 0/33000090, bkp: 0000, desc: checkpoint: redo 0/34000028;
>>> tli 27; prev tli 27; fpw true; xid 0/6010; oid 54128; multi 1; offset 0;
>>> oldest xid 1799 in DB 1; oldest multi 1 in DB 1; oldest running xid 0;
>>> shutdown
>>> pg_xlogdump: FATAL: error in WAL record at 0/34000028: record with zero
>>> length at 0/34000090
>>>
>>> postgres(at)pgtest03g ~/9.3/data/pg_xlog $
>>>
>>> I’m not sure that these 104 bytes will always be 104 bytes to have a
>>> strict equality while checking. Could it change in the future? Or is there
>>> a better way to understand that streaming replica received all data after
>>> master shutdown? The check that pg_xlog_location_diff returns 104 bytes
>>> seems a bit strange.
>>>
>>
> Don't rely on it being 104 bytes. It can vary across versions, and across
> different architectures.
>
> You could simply check that the standby's pg_last_xlog_replay_location() >
> master's "Latest checkpoint location", and not care about the exact
> difference.
>
>
>

​I believe there were some changes made in v9.3 which will wait for pending
WALs to be replica​ted before a fast and smart shutdown (of master) can
close the replication connection.

http://git.postgresql.org/pg/commitdiff/985bd7d49726c9f178558491d31a570d47340459

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2015-01-15 10:19:01 Re: [HACKERS] Check that streaming replica received all data after master shutdown
Previous Message Daniel Verite 2015-01-14 14:28:21 Re: Simple Atomic Relationship Insert

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2015-01-15 09:50:36 Re: Fillfactor for GIN indexes
Previous Message Peter Geoghegan 2015-01-15 08:40:27 Re: hung backends stuck in spinlock heavy endless loop