Re: stress test for parallel workers

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Mark Wong <mark(at)2ndquadrant(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stress test for parallel workers
Date: 2019-10-11 21:03:22
Message-ID: CA+hUKG+C6uPF1cNZkW8xg+NAgorW9Q5DQusGGCAj+K8sb8m_aQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 12, 2019 at 9:40 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On 2019-10-11 14:56:41 -0400, Tom Lane wrote:
> >> ... So it's really hard to explain
> >> that as anything except a kernel bug: sometimes, the kernel
> >> doesn't give us as much stack as it promised it would. And the
> >> machine is not loaded enough for there to be any rational
> >> resource-exhaustion excuse for that.
>
> > Linux expands stack space only on demand, thus it's possible to run out
> > of stack space while there ought to be stack space. Unfortunately that
> > during a stack expansion, which means there's no easy place to report
> > that. I've seen this be hit in production on busy machines.
>
> As I said, this machine doesn't seem busy enough for that to be a
> tenable excuse; there's nobody but me logged in, and the buildfarm
> critter isn't running.

Yeah. As I speculated in the other thread[1], the straightforward
can't-allocate-any-more-space-but-no-other-way-to-tell-you-that case,
ie, the explanation that doesn't involve a bug in Linux or PostgreSQL,
seems unlikely unless we also see other more obvious signs of
occasional overcommit problems (ie not during stack expansion) on
those hosts, doesn't it? How likely is it that this 1-2MB of stack
space is the straw that breaks the camels back, every time?

[1] https://www.postgresql.org/message-id/CA%2BhUKGJ_MkqdEH-LmmebhNLSFeyWwvYVXfPaz3A2_p27EQfZwA%40mail.gmail.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-10-11 21:13:04 Re: stress test for parallel workers
Previous Message Justin Pryzby 2019-10-11 21:03:20 v12.0 ERROR: trying to store a heap tuple into wrong type of slot