pg_rewind fails after failover, 'invalid record length'

From: Stuart Bishop <stuart(at)stuartbishop(dot)net>
To: PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: pg_rewind fails after failover, 'invalid record length'
Date: 2017-02-15 10:02:35
Message-ID: CADmi=6O_RApqN=4QWA72gAqt96p0s1+3g+=pN1xgEhVVPzt6qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I have a test case with 3 PostgreSQL 9.5.5 servers, one master and two
hot standbys using standard streaming replication from the master.
wal_log_hints is not enabled, but all systems initialized to use
checksums.

The system is idle. I tear down the master, leaving the two standbys
orphaned at the same point in timeline 1.

I promote one of the standbys to master, switching it to timeline 2. I
shutdown the other standby, and attempt to run pg_rewind. It fails:

$ /usr/lib/postgresql/9.5/bin/pg_rewind
--target-pgdata=/var/lib/postgresql/9.5/main
--source-server='dbname=postgres host=10.0.4.212 port=5432
user=_juju_repl'
servers diverged at WAL position 0/5000AE0 on timeline 1

could not find previous WAL record at 0/5000AE0: invalid record length
at 0/5000AE0
Failure, exiting

This is what the pg_xlog on the new master looked like at that point:

postgres(at)juju-4ead0d-11:~/9.5/main/pg_xlog$ ls -al
total 81993
drwx------ 3 postgres postgres 9 Feb 15 08:55 .
drwx------ 19 postgres postgres 25 Feb 15 08:55 ..
-rw------- 1 postgres postgres 16777216 Feb 15 07:52 000000010000000000000002
-rw------- 1 postgres postgres 16777216 Feb 15 07:52 000000010000000000000003
-rw------- 1 postgres postgres 16777216 Feb 15 07:52 000000010000000000000004
-rw------- 1 postgres postgres 16777216 Feb 15 08:52
000000010000000000000005.partial
-rw------- 1 postgres postgres 41 Feb 15 08:55 00000002.history
-rw------- 1 postgres postgres 16777216 Feb 15 09:15 000000020000000000000005
drwx------ 2 postgres postgres 6 Feb 15 08:55 archive_status

Reconfiguring the standby to replicate from the new master and
restarting it works fine. The standby happily replicates and switches
to the new timeline. I can shut this standby down and run pg_rewind
again and it works fine.

--
Stuart Bishop <stuart(at)stuartbishop(dot)net>
http://www.stuartbishop.net/

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Hari Sankar A 2017-02-15 15:44:44 Problem with PostgreSQL string sorting
Previous Message kcwitt 2017-02-15 09:14:08 BUG #14546: "point" type does not work with "IS DISTINCT"