BUG #15567: Wal receiver process restart failed when a damaged wal record arrived at standby.

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: lichuancheng(at)highgo(dot)com
Subject: BUG #15567: Wal receiver process restart failed when a damaged wal record arrived at standby.
Date: 2018-12-28 02:01:23
Message-ID: 15567-a9ff747c5e7c031c@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 15567
Logged by: movead
Email address: lichuancheng(at)highgo(dot)com
PostgreSQL version: 10.5
Operating system: centos7
Description:

[Problem description]
We use pg9.5.7 with replica and i find that if some damaged wal record
arrived at standby then the startup process on standby try to restart
walreceiver process but never sucess and log below Infinite cycle on
standby.If i try to restart database now,it sucess.
(I try it on pg10.5,the same result.)
-----------------------------------------------------------------------------------------------------------------
2018-12-15 10:38:00.302 CST,,,30928,,5c1422d5.78d0,2,,2018-12-15 05:38:29
CST,,0,FATAL,57P01,"terminating walreceiver process due to administrator
command",,,,,,,,,""
2018-12-15 10:38:00.302 CST,,,5546,,5c1276c1.15aa,8,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:38:00.302 CST,,,5546,,5c1276c1.15aa,9,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:38:00.302 CST,,,5546,,5c1276c1.15aa,10,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:38:05.308 CST,,,5546,,5c1276c1.15aa,11,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:38:10.314 CST,,,5546,,5c1276c1.15aa,12,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:38:15.319 CST,,,5546,,5c1276c1.15aa,13,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:38:20.325 CST,,,5546,,5c1276c1.15aa,14,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:38:25.331 CST,,,5546,,5c1276c1.15aa,15,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:38:30.336 CST,,,5546,,5c1276c1.15aa,16,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:38:35.342 CST,,,5546,,5c1276c1.15aa,17,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:38:40.348 CST,,,5546,,5c1276c1.15aa,18,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:38:45.353 CST,,,5546,,5c1276c1.15aa,19,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:38:50.359 CST,,,5546,,5c1276c1.15aa,20,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:38:55.364 CST,,,5546,,5c1276c1.15aa,21,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:39:00.370 CST,,,5546,,5c1276c1.15aa,22,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
-----------------------------------------------------------------------------------------------------------------

[Code review]
I read the code about startup process and walreciver process.

When a damaged wal record arrived:
1.The startup process shutdown the walreciver process use function
ShutdownWalRcv().
2.The startup process start the walreciver process use funtion
RequestXLogStreaming() by signal PMSIGNAL_START_WALRECEIVER.
3.The startup process try to read wal record.
And now reciver process does not startup completely,so the startup process
read the damaged wal record another times.
And startup process set walrcv->walRcvState = WALRCV_STOPPED use function
ShutdownWalRcv().
4.go to 2.
That's the infinite cycle.

[Question]
I wonder that if it is a bug or you just don't want restart walreciver
process.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2018-12-28 10:26:04 BUG #15568: pg_dump error
Previous Message Hugh Ranalli 2018-12-27 16:21:52 Re: BUG #15548: Unaccent does not remove combining diacritical characters