Why standby restores some WALs many times from archive?

From: Victor Yagofarov <xnasx(at)yandex(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org, eshkinkot(at)gmail(dot)com
Subject: Why standby restores some WALs many times from archive?
Date: 2017-12-28 12:02:42
Message-ID: 162721514462562@web39j.yandex.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I tried to ask a question in pgsql-general@ but got no response.

I have postgres 9.4 standby with archive-based replication (via restore_command).
PostgreSQL 9.4.12 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit

One day I noticed strange behavior in postgres logs: the same WAL has been restored many times:

"restored log file ""0000000200064AA80000005D"" from archive",,,,,,,,,""
"unexpected pageaddr 64A81/F9000000 in log segment 0000000200064AA80000005E, offset 0",,,,,,,,,""
"restored log file ""0000000200064AA80000005D"" from archive",,,,,,,,,""
"restored log file ""0000000200064AA80000005D"" from archive",,,,,,,,,""
"restored log file ""0000000200064AA80000005D"" from archive",,,,,,,,,""
"restored log file ""0000000200064AA80000005D"" from archive",,,,,,,,,""
"restored log file ""0000000200064AA80000005D"" from archive",,,,,,,,,""
"restored log file ""0000000200064AA80000005D"" from archive",,,,,,,,,""
"restored log file ""0000000200064AA80000005D"" from archive",,,,,,,,,""
"restored log file ""0000000200064AA80000005D"" from archive",,,,,,,,,""
"restored log file ""0000000200064AA80000005E"" from archive",,,,,,,,,""
"restored log file ""0000000200064AA80000005F"" from archive",,,,,,,,,""
"restored log file ""0000000200064AA800000060"" from archive",,,,,,,,,""

I enabled logging of successful execution in 'restore_command' script and realized that it's really happens: some WALs are recovering again and again.
And this is always connected with 'unexpected pageaddr' log message.

Some statistics of how many 'same WALs' have been restored in 3 minutes:
[vyagofarov(at)myserver:~] $ cut -f 2 -d '-' wals_restored.log | uniq -c -d | sort -r
9 0000000200064AA9000000A4
9 0000000200064AA80000005D
8 0000000200064AAA00000044
7 0000000200064AAA000000AB
6 0000000200064AAA000000ED
3 0000000200064AA7000000E8
2 0000000200064AA80000003D
2 0000000200064AA800000033
2 0000000200064AA800000028
2 0000000200064AA800000018
2 0000000200064AA800000004
2 0000000200064AA7000000FD

Is this a bug? It can be quite high traffic if the archive server is used by >10 clusters.

--
Best regards,
Victor Yagofarov

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-12-28 13:21:01 Re: pgsql: Add parallel-aware hash joins.
Previous Message Andrey Borodin 2017-12-28 11:37:48 Re: New gist vacuum.