Re: Anti-critical-section assertion failure in mcxt.c reached by walsender

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: Anti-critical-section assertion failure in mcxt.c reached by walsender
Date: 2021-05-08 04:57:54
Message-ID: CA+hUKGKfrXnuyk0Z24m8x4_eziuC3kLSaCmEeKPO1DVU9t-qtQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, May 8, 2021 at 2:30 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> May 07 03:31:39 gcc202 kernel: sunvdc: vdc_tx_trigger() failure, err=-11

That's -EAGAIN (assuming errnos match x86) and I guess it indicates
that VDC_MAX_RETRIES is exceeded here:

https://github.com/torvalds/linux/blob/master/drivers/block/sunvdc.c#L451
https://github.com/torvalds/linux/blob/master/drivers/block/sunvdc.c#L526

One theory is that the hypervisor/host is occasionally too swamped to
service the request queue fast enough over a ~10ms period, given that
vio_ldc_send() itself retries 1000 times with a 1us sleep, the outer
loop tries ten times, and ldc.c's write_nonraw() reports -EAGAIN when
there is no space for the message. (Alternatively, it's trying to
send a message that's too big for the channel, the channel is
corrupted by bugs, or my fly-by of this code I'd never heard of before
now is just way off...)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2021-05-08 05:48:48 Re: Small issues with CREATE TABLE COMPRESSION
Previous Message David Rowley 2021-05-08 04:50:00 Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays