Quick Links

Re: WIP: WAL prefetch (another approach)

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Stephen Frost <sfrost(at)snowman(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP: WAL prefetch (another approach)
Date:	2021-05-04 19:47:41
Message-ID:	3967044.1620157661@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I wrote:
> I suppose that if we're unable to reproduce it on at least one other box,
> we have to write it off as hardware flakiness.

BTW, that conclusion shouldn't distract us from the very real bug
that Andres identified. I was just scraping the buildfarm logs
concerning recent failures, and I found several recent cases
that match the symptom he reported:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk&dt=2021-04-23%2022%3A27%3A41
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2021-04-21%2005%3A15%3A24
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2021-04-20%2002%3A03%3A08
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2021-05-04%2004%3A07%3A41
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2021-04-20%2021%3A08%3A59

They all show the standby in recovery/019_replslot_limit.pl failing
with symptoms like

2021-05-04 07:42:00.968 UTC [24707406:1] LOG: database system was shut down in recovery at 2021-05-04 07:41:39 UTC
2021-05-04 07:42:00.968 UTC [24707406:2] LOG: entering standby mode
2021-05-04 07:42:01.050 UTC [24707406:3] LOG: redo starts at 0/1C000D8
2021-05-04 07:42:01.079 UTC [24707406:4] LOG: consistent recovery state reached at 0/1D00000
2021-05-04 07:42:01.079 UTC [24707406:5] FATAL: invalid memory alloc request size 1476397045
2021-05-04 07:42:01.080 UTC [13238274:3] LOG: database system is ready to accept read only connections
2021-05-04 07:42:01.082 UTC [13238274:4] LOG: startup process (PID 24707406) exited with exit code 1

(BTW, the behavior seen here where the failure occurs *immediately*
after reporting "consistent recovery state reached" is seen in the
other reports as well, including Andres' version. I wonder if that
means anything.)

regards, tom lane

In response to

Re: WIP: WAL prefetch (another approach) at 2021-05-04 13:46:12 from Tom Lane

Responses

Re: WIP: WAL prefetch (another approach) at 2021-05-05 01:08:35 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrew Dunstan	2021-05-04 19:57:39	Re: PG in container w/ pid namespace is init, process exits cause restart
Previous Message	Robert Haas	2021-05-04 19:18:36	Re: [bug?] Missed parallel safety checks, and wrong parallel safety