Re: Anti-critical-section assertion failure in mcxt.c reached by walsender

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: Anti-critical-section assertion failure in mcxt.c reached by walsender
Date: 2021-05-07 17:18:19
Message-ID: 43381.1620407899@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
>> Oh, and I see that 13 has 9989d37d "Remove XLogFileNameP() from the
>> tree" to fix this exact problem.

> Hah, so that maybe explains why thorntail has only shown this in
> the v12 branch. Should we consider back-patching that?

Realizing that 9989d37d prevents the assertion failure, I went
to see if thorntail had shown EIO failures without assertions.
Looking back 180 days, I found these:

sysname | branch | snapshot | stage | l
-----------+---------------+---------------------+--------------------+------------------------------------------------------------------------------------------------------------------------------------------------
thorntail | HEAD | 2021-03-19 21:28:15 | recoveryCheck | 2021-03-20 00:48:48.117 MSK [4089174:11] 008_fsm_truncation.pl PANIC: could not fdatasync file "000000010000000000000002": Input/output error
thorntail | HEAD | 2021-04-06 16:08:10 | recoveryCheck | 2021-04-06 19:30:54.103 MSK [3355008:11] 008_fsm_truncation.pl PANIC: could not fdatasync file "000000010000000000000002": Input/output error
thorntail | REL9_6_STABLE | 2021-04-12 02:38:04 | pg_basebackupCheck | pg_basebackup: could not fsync file "000000010000000000000013": Input/output error

So indeed the kernel-or-hardware problem is affecting other branches.
I suspect that the lack of reports in the pre-v12 branches is mostly
down to there having been many fewer runs on those branches within
the past couple months.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-05-07 17:20:32 Re: Why do we have perl and sed versions of Gen_dummy_probes?
Previous Message Tom Lane 2021-05-07 17:04:25 Re: Why do we have perl and sed versions of Gen_dummy_probes?