Re: Crash in new pgstats code

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <fujii(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Crash in new pgstats code
Date: 2022-04-18 10:45:07
Message-ID: CA+hUKGL4g=fn6Zne8o3hv4Ek=u8OWK4kcopZfmSj4Rp=ueSckA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 18, 2022 at 7:19 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> On Sat, Apr 16, 2022 at 02:36:33PM -0700, Andres Freund wrote:
> > which I haven't seen locally. Looks like we have some race between
> > startup process and walreceiver? That seems not great. I'm a bit
> > confused that walreceiver and archiving are both active at the same time
> > in the first place - that doesn't seem right as things are set up
> > currently.
>
> Yeah, that should be exclusively one or the other, never both.
> WaitForWALToBecomeAvailable() would be a hot spot when it comes to
> decide when a WAL receiver should be spawned by the startup process.
> Except from the recent refactoring of xlog.c or the WAL prefetch work,
> there has not been many changes in this area lately.

Hmm, well I'm not sure what is happening here and will try to dig
tomorrow, but one observation from some log scraping is that kestrel
logged similar output with "could not link file" several times before
the main prefetching commit (5dc0418). I looked back 3 months on
kestrel/HEAD and found these:

commit | log
---------+-------------------------------------------------------------------------------------------------------------------
411b913 | https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=kestrel&dt=2022-03-27%2010:57:20&stg=recovery-check
3d067c5 | https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=kestrel&dt=2022-03-29%2017:52:32&stg=recovery-check
cd7ea75 | https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=kestrel&dt=2022-03-30%2015:25:03&stg=recovery-check
8e053dc | https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=kestrel&dt=2022-03-30%2020:27:44&stg=recovery-check
4e34747 | https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=kestrel&dt=2022-04-04%2020:32:24&stg=recovery-check
01effb1 | https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=kestrel&dt=2022-04-06%2007:32:40&stg=recovery-check
fbfe691 | https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=kestrel&dt=2022-04-07%2005:10:05&stg=recovery-check
5dc0418 | https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=kestrel&dt=2022-04-07%2007:51:00&stg=recovery-check
bd037dc | https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=kestrel&dt=2022-04-11%2022:00:58&stg=recovery-check
a4b5754 | https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=kestrel&dt=2022-04-12%2004:40:44&stg=recovery-check
7129a97 | https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=kestrel&dt=2022-04-15%2022:42:07&stg=recovery-check
9f4f0a0 | https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=kestrel&dt=2022-04-16%2020:05:34&stg=recovery-check

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-04-18 11:04:06 Re: Column Filtering in Logical Replication
Previous Message Dilip Kumar 2022-04-18 09:59:37 Re: Stabilizing the test_decoding checks, take N