Re: Anti-critical-section assertion failure in mcxt.c reached by walsender

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: Anti-critical-section assertion failure in mcxt.c reached by walsender
Date: 2021-05-07 19:49:47
Message-ID: 20210507194947.etrgj7mpcv73mxef@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-05-07 10:29:58 -0400, Tom Lane wrote:
> I wrote:
> > 1. No wonder we could not reproduce it anywhere else. I've warned
> > the cfarm admins that their machine may be having hardware issues.
>
> I heard back from the machine's admin. The time of the crash I observed
> matches exactly to these events in the kernel log:
>
> May 07 03:31:39 gcc202 kernel: dm-0: writeback error on inode 2148294407, offset 0, sector 159239256
> May 07 03:31:39 gcc202 kernel: sunvdc: vdc_tx_trigger() failure, err=-11
> May 07 03:31:39 gcc202 kernel: blk_update_request: I/O error, dev vdiskc, sector 157618896 op 0x1:(WRITE) flags 0x4800 phys_seg 16 prio class 0
>
> So it's not a mirage. The admin seems to think it might be a kernel
> bug though.

Isn't this a good reason to have at least some tests run with fsync=on?

It makes a ton of sense for buildfarm animals to disable fsync to
achieve acceptable performance. Having something in there that
nevertheless does some light exercise of the fsync code doesn't seem
bad?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2021-05-07 19:56:34 Re: plan with result cache is very slow when work_mem is not enough
Previous Message Andrew Dunstan 2021-05-07 19:47:28 Re: Anti-critical-section assertion failure in mcxt.c reached by walsender