Re: BUG #17928: Standby fails to decode WAL on termination of primary

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: BUG #17928: Standby fails to decode WAL on termination of primary
Date: 2023-05-11 18:00:00
Message-ID: 72bd036d-4f2a-8d50-b56e-6b1e3b9ba0a9@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

11.05.2023 11:00, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference: 17928
> ...
> `git bisect` for this behavior blames 3f1ce9734 (where
> XLogDecodeNextRecord() -> XLogReadRecordAlloc() call was introduced).
>
> A reproducer for the anomaly to follow.
The TAP test that demonstrates the issue is attached. To catch the failure
faster, I place it in multiple directories src/test/recoveryX/t, add
minimal Makefiles, and run (on tmpfs):
for ((i=1;i<=10;i++)); do echo "iteration $i"; NO_TEMP_INSTALL=1 parallel --halt now,fail=1 -j7 --linebuffer --tag make
-s check -C src/test/{} ::: recovery1 recovery2 recovery3 recovery4 recovery5 recovery6 recovery7 || break; done

iteration 1
recovery1       +++ tap check in src/test/recovery1 +++
recovery2       +++ tap check in src/test/recovery2 +++
recovery3       +++ tap check in src/test/recovery3 +++
recovery4       +++ tap check in src/test/recovery4 +++
recovery5       +++ tap check in src/test/recovery5 +++
recovery6       +++ tap check in src/test/recovery6 +++
recovery7       +++ tap check in src/test/recovery7 +++
...
recovery5       # Restarting primary instance (49)
recovery3       # Restarting primary instance (49)
recovery7       # Restarting primary instance (49)
recovery2       Bailout called.  Further testing stopped:  pg_ctl stop failed
recovery2       FAILED--Further testing stopped: pg_ctl stop failed
recovery2       make: *** [Makefile:6: check] Error 255
parallel: This job failed:
make -s check -C src/test/recovery2

tail src/test/recovery2/tmp_check/log/099_restart_with_stanby_standby.log
2023-05-11 20:19:22.247 MSK [2046385] DETAIL:  End of WAL reached on timeline 1 at 3/64BDFF8.
2023-05-11 20:19:22.247 MSK [2046385] FATAL:  could not send end-of-streaming message to primary: server closed the
connection unexpectedly
                This probably means the server terminated abnormally
                before or while processing the request.
        no COPY in progress
2023-05-11 20:19:22.248 MSK [2037134] FATAL:  invalid memory alloc request size 2021163525
2023-05-11 20:19:22.248 MSK [2037114] LOG:  startup process (PID 2037134) exited with exit code 1
2023-05-11 20:19:22.248 MSK [2037114] LOG:  terminating any other active server processes
2023-05-11 20:19:22.248 MSK [2037114] LOG:  shutting down due to startup process failure
2023-05-11 20:19:22.249 MSK [2037114] LOG:  database system is shut down

Best regards,
Alexander

Attachment Content-Type Size
099_restart_with_stanby.pl application/x-perl 1.7 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Robert Haas 2023-05-11 18:21:04 Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)
Previous Message Robert Haas 2023-05-11 16:11:14 Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)