Re: stress test for parallel workers

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Mark Wong <mark(at)2ndquadrant(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stress test for parallel workers
Date: 2019-10-11 19:41:12
Message-ID: CA+hUKGKNgufn12Uh4iEh5y=JkEnUBWnLLmi8L4zwzMunFeKwSA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 12, 2019 at 7:56 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> This matches up with the intermittent infinite_recurse failures
> we've been seeing in the buildfarm. Those are happening across
> a range of systems, but they're (almost) all Linux-based ppc64,
> suggesting that there's a longstanding arch-specific kernel bug
> involved. For reference, I scraped the attached list of such
> failures in the last three months. I wonder whether we can get
> the attention of any kernel hackers about that.

Yeah, I don't know anything about this stuff, but I was also beginning
to wonder if something is busted in the arch-specific fault.c code
that checks if stack expansion is valid[1], in a way that fails with a
rapidly growing stack, well timed incoming signals, and perhaps
Docker/LXC (that's on Mark's systems IIUC, not sure about the ARM
boxes that failed or if it could be relevant here). Perhaps the
arbitrary tolerances mentioned in that comment are relevant.

[1] https://github.com/torvalds/linux/blob/master/arch/powerpc/mm/fault.c#L244

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-10-11 19:48:52 Re: Connect as multiple users using single client certificate
Previous Message Kyle Bateman 2019-10-11 19:28:45 Re: Connect as multiple users using single client certificate