Quick Links

Re: BUG #15567: Wal receiver process restart failed when a damaged wal record arrived at standby.

From:	Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To:	lichuancheng(at)highgo(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: BUG #15567: Wal receiver process restart failed when a damaged wal record arrived at standby.
Date:	2018-12-29 03:59:34
Message-ID:	CAK-MWwTYxLo9M5tzGToZeNj6YLKCcBhNETnGTbLexfj53RsW-w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

On Fri, Dec 28, 2018 at 8:10 PM PG Bug reporting form <
noreply(at)postgresql(dot)org> wrote:

> The following bug has been logged on the website:
>
> Bug reference: 15567
> Logged by: movead
> Email address: lichuancheng(at)highgo(dot)com
> PostgreSQL version: 10.5
> Operating system: centos7
> Description:
>
> [Problem description]
> We use pg9.5.7 with replica and i find that if some damaged wal record
> arrived at standby then the startup process on standby try to restart
> walreceiver process but never sucess and log below Infinite cycle on
> standby.If i try to restart database now,it sucess.
> (I try it on pg10.5,the same result.)
>
> -----------------------------------------------------------------------------------------------------------------
> 2018-12-15 10:38:00.302 CST,,,30928,,5c1422d5.78d0,2,,2018-12-15 05:38:29
> CST,,0,FATAL,57P01,"terminating walreceiver process due to administrator
> command",,,,,,,,,""
> 2018-12-15 10:38:00.302 CST,,,5546,,5c1276c1.15aa,8,,2018-12-13 23:12:01
> CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
> 5A9/11E0A90",,,,,,,,,""
> 2018-12-15 10:38:00.302 CST,,,5546,,5c1276c1.15aa,9,,2018-12-13 23:12:01
> CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
> 5A9/11E0A90",,,,,,,,,""
> 2018-12-15 10:38:00.302 CST,,,5546,,5c1276c1.15aa,10,,2018-12-13 23:12:01
> CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
> 5A9/11E0A90",,,,,,,,,""
> 2018-12-15 10:38:05.308 CST,,,5546,,5c1276c1.15aa,11,,2018-12-13 23:12:01
> CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
> 5A9/11E0A90",,,,,,,,,""
> 2018-12-15 10:38:10.314 CST,,,5546,,5c1276c1.15aa,12,,2018-12-13 23:12:01
> CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
> 5A9/11E0A90",,,,,,,,,""
> 2018-12-15 10:38:15.319 CST,,,5546,,5c1276c1.15aa,13,,2018-12-13 23:12:01
> CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
> 5A9/11E0A90",,,,,,,,,""
> 2018-12-15 10:38:20.325 CST,,,5546,,5c1276c1.15aa,14,,2018-12-13 23:12:01
> CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
> 5A9/11E0A90",,,,,,,,,""
> 2018-12-15 10:38:25.331 CST,,,5546,,5c1276c1.15aa,15,,2018-12-13 23:12:01
> CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
> 5A9/11E0A90",,,,,,,,,""
> 2018-12-15 10:38:30.336 CST,,,5546,,5c1276c1.15aa,16,,2018-12-13 23:12:01
> CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
> 5A9/11E0A90",,,,,,,,,""
> 2018-12-15 10:38:35.342 CST,,,5546,,5c1276c1.15aa,17,,2018-12-13 23:12:01
> CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
> 5A9/11E0A90",,,,,,,,,""
> 2018-12-15 10:38:40.348 CST,,,5546,,5c1276c1.15aa,18,,2018-12-13 23:12:01
> CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
> 5A9/11E0A90",,,,,,,,,""
> 2018-12-15 10:38:45.353 CST,,,5546,,5c1276c1.15aa,19,,2018-12-13 23:12:01
> CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
> 5A9/11E0A90",,,,,,,,,""
> 2018-12-15 10:38:50.359 CST,,,5546,,5c1276c1.15aa,20,,2018-12-13 23:12:01
> CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
> 5A9/11E0A90",,,,,,,,,""
> 2018-12-15 10:38:55.364 CST,,,5546,,5c1276c1.15aa,21,,2018-12-13 23:12:01
> CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
> 5A9/11E0A90",,,,,,,,,""
> 2018-12-15 10:39:00.370 CST,,,5546,,5c1276c1.15aa,22,,2018-12-13 23:12:01
> CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
> 5A9/11E0A90",,,,,,,,,""
>
> -----------------------------------------------------------------------------------------------------------------
>
>
> [Code review]
> I read the code about startup process and walreciver process.
>
> When a damaged wal record arrived:
> 1.The startup process shutdown the walreciver process use function
> ShutdownWalRcv().
> 2.The startup process start the walreciver process use funtion
> RequestXLogStreaming() by signal PMSIGNAL_START_WALRECEIVER.
> 3.The startup process try to read wal record.
> And now reciver process does not startup completely,so the startup
> process
> read the damaged wal record another times.
> And startup process set walrcv->walRcvState = WALRCV_STOPPED use
> function
> ShutdownWalRcv().
> 4.go to 2.
> That's the infinite cycle.
>
> [Question]
> I wonder that if it is a bug or you just don't want restart walreciver
> process.
>
>
It look like the same issue as I reported in BUG #15151.
https://www.postgresql.org/message-id/flat/152353560977.31233.11406243320876395727%40wrigleys.postgresql.org
(unfortunately I didn't have time to explore it further then).

You have good description of what's going on under hood.

--
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

In response to

BUG #15567: Wal receiver process restart failed when a damaged wal record arrived at standby. at 2018-12-28 02:01:23 from PG Bug reporting form

Responses

Re: Re: BUG #15567: Wal receiver process restart failed when a damaged wal record arrived at standby. at 2018-12-29 05:08:18 from lichuancheng@highgo.com

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	lichuancheng@highgo.com	2018-12-29 05:08:18	Re: Re: BUG #15567: Wal receiver process restart failed when a damaged wal record arrived at standby.
Previous Message	PG Bug reporting form	2018-12-28 10:26:04	BUG #15568: pg_dump error