BUG #14591: WALs differ on a standby node, from streaming to archive_command

From: alessandro(dot)grassi(at)2ndquadrant(dot)it
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #14591: WALs differ on a standby node, from streaming to archive_command
Date: 2017-03-16 17:05:13
Message-ID: 20170316170513.1429.77904@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 14591
Logged by: Alessandro Grassi
Email address: alessandro(dot)grassi(at)2ndquadrant(dot)it
PostgreSQL version: 9.6.2
Operating system: Ubuntu 16.04


I have two nodes running PostgreSQL 9.6 in streaming replication.
Both of them send WALs to another system with streaming replication and
archive_command simultaneously.
This means that I should receive four exact copies of the same file.

However, this only happens with three of them: the WALs from the primary
node are the same, the WAL received with the archive_command from the
standby node is the same as the other two, but the WAL received from the
standby with streaming replication is different.

This happened reliably on the last 7 WALs.
All of these WALs have been switched manually - that is, none of them are
using all the 16 available megabytes.

Examining the different WALs with pg_xlogdump yields the exact same

/tmp$ /usr/lib/postgresql/9.6/bin/pg_xlogdump -b 000000010000000000000017 >
/tmp$ /usr/lib/postgresql/9.6/bin/pg_xlogdump -b
000000010000000000000017.dup > xlogdump.2
/tmp$ diff xlogdump.1 xlogdump.2

Performing a binary comparison reveals that there is a difference in one or
two bytes that occours multiple times, and all the differences are on an
offset that is higher than the xlog switch.

/tmp$ cmp -l 000000010000000000000017 000000010000000000000017.dup |
sort|uniq|wc -l
/tmp$ cmp -l 000000010000000000000017 000000010000000000000017.dup | head
311299 4 0
/tmp$ printf "%x\\n" 311299
/tmp$ tail -1 xlogdump.1
rmgr: XLOG len (rec/tot): 0/ 24, tx: 0, lsn:
0/1700DDD8, prev 0/1700DD68, desc: SWITCH

The number of different occurrences vary from WAL to WAL:

/tmp$ for i in 00000001000000000000001? ; do cmp -l ${i} ${i}.* | awk
{'print $2" "$3'}| sort|uniq|wc -l; done

It's worth noting that, while the number of occurrences and their content
differ from WAL to WAL, the differences are always the same one or two
within the same file:

/tmp$ cmp -l 000000010000000000000017 000000010000000000000017.dup |
sort|uniq|wc -l
/tmp$ cmp -l 000000010000000000000017* |awk {'print $2" "$3'}|sort|uniq
27 16
4 0

Both servers run on the exact same hardware and software.

I saved all the WALS, and I can upload them somewhere if you wish.

Browse pgsql-bugs by date

  From Date Subject
Next Message Nikolay Samokhvalov 2017-03-16 19:34:49 ON CONFLICT with constraint name doesn't work
Previous Message Wiler Coelho Jr. 2017-03-16 14:13:28 Re: Error floating-point exception on postgresql installer