BUG #14591: WALs differ on a standby node, from streaming to archive_command

From: alessandro(dot)grassi(at)2ndquadrant(dot)it
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #14591: WALs differ on a standby node, from streaming to archive_command
Date: 2017-03-16 17:05:13
Message-ID: 20170316170513.1429.77904@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 14591
Logged by: Alessandro Grassi
Email address: alessandro(dot)grassi(at)2ndquadrant(dot)it
PostgreSQL version: 9.6.2
Operating system: Ubuntu 16.04
Description:

Greetings,

I have two nodes running PostgreSQL 9.6 in streaming replication.
Both of them send WALs to another system with streaming replication and
archive_command simultaneously.
This means that I should receive four exact copies of the same file.

However, this only happens with three of them: the WALs from the primary
node are the same, the WAL received with the archive_command from the
standby node is the same as the other two, but the WAL received from the
standby with streaming replication is different.

This happened reliably on the last 7 WALs.
All of these WALs have been switched manually - that is, none of them are
using all the 16 available megabytes.

Examining the different WALs with pg_xlogdump yields the exact same
results.

/tmp$ /usr/lib/postgresql/9.6/bin/pg_xlogdump -b 000000010000000000000017 >
xlogdump.1
/tmp$ /usr/lib/postgresql/9.6/bin/pg_xlogdump -b
000000010000000000000017.dup > xlogdump.2
/tmp$ diff xlogdump.1 xlogdump.2
/tmp$

Performing a binary comparison reveals that there is a difference in one or
two bytes that occours multiple times, and all the differences are on an
offset that is higher than the xlog switch.

/tmp$ cmp -l 000000010000000000000017 000000010000000000000017.dup |
sort|uniq|wc -l
992
/tmp$
/tmp$ cmp -l 000000010000000000000017 000000010000000000000017.dup | head
-1
311299 4 0
/tmp$ printf "%x\\n" 311299
4c003
/tmp$ tail -1 xlogdump.1
rmgr: XLOG len (rec/tot): 0/ 24, tx: 0, lsn:
0/1700DDD8, prev 0/1700DD68, desc: SWITCH
/tmp$

The number of different occurrences vary from WAL to WAL:

/tmp$ for i in 00000001000000000000001? ; do cmp -l ${i} ${i}.* | awk
{'print $2" "$3'}| sort|uniq|wc -l; done
1
2
1
1
1
2
2
/tmp$

It's worth noting that, while the number of occurrences and their content
differ from WAL to WAL, the differences are always the same one or two
within the same file:

/tmp$ cmp -l 000000010000000000000017 000000010000000000000017.dup |
sort|uniq|wc -l
992
/tmp$ cmp -l 000000010000000000000017* |awk {'print $2" "$3'}|sort|uniq
27 16
4 0
/tmp$

Both servers run on the exact same hardware and software.

I saved all the WALS, and I can upload them somewhere if you wish.

Browse pgsql-bugs by date

  From Date Subject
Next Message Nikolay Samokhvalov 2017-03-16 19:34:49 ON CONFLICT with constraint name doesn't work
Previous Message Wiler Coelho Jr. 2017-03-16 14:13:28 Re: Error floating-point exception on postgresql installer