Re: stress test for parallel workers

From: Mark Wong <mark(at)2ndQuadrant(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stress test for parallel workers
Date: 2019-10-11 20:28:53
Message-ID: 20191011202853.GA23809@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 12, 2019 at 08:41:12AM +1300, Thomas Munro wrote:
> On Sat, Oct 12, 2019 at 7:56 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > This matches up with the intermittent infinite_recurse failures
> > we've been seeing in the buildfarm. Those are happening across
> > a range of systems, but they're (almost) all Linux-based ppc64,
> > suggesting that there's a longstanding arch-specific kernel bug
> > involved. For reference, I scraped the attached list of such
> > failures in the last three months. I wonder whether we can get
> > the attention of any kernel hackers about that.
>
> Yeah, I don't know anything about this stuff, but I was also beginning
> to wonder if something is busted in the arch-specific fault.c code
> that checks if stack expansion is valid[1], in a way that fails with a
> rapidly growing stack, well timed incoming signals, and perhaps
> Docker/LXC (that's on Mark's systems IIUC, not sure about the ARM
> boxes that failed or if it could be relevant here). Perhaps the
> arbitrary tolerances mentioned in that comment are relevant.

This specific one (wobbegon) is OpenStack/KVM[2], for what it's worth...

"... cluster is an OpenStack based cluster offering POWER8 & POWER9 LE
instances running on KVM ..."

But to keep you on your toes, some of my ppc animals are Docker within
other OpenStack/KVM instance...

Regards,
Mark

[1] https://github.com/torvalds/linux/blob/master/arch/powerpc/mm/fault.c#L244
[2] https://osuosl.org/services/powerdev/

--
Mark Wong
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-10-11 20:30:17 Re: let's make the list of reportable GUCs configurable (was Re: Add %r substitution for psql prompts to show recovery status)
Previous Message Tom Lane 2019-10-11 20:13:46 Re: stress test for parallel workers