Re: stress test for parallel workers

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, mark(at)2ndquadrant(dot)com, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: stress test for parallel workers
Date: 2019-10-11 20:40:41
Message-ID: 19525.1570826441@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2019-10-11 14:56:41 -0400, Tom Lane wrote:
>> ... So it's really hard to explain
>> that as anything except a kernel bug: sometimes, the kernel
>> doesn't give us as much stack as it promised it would. And the
>> machine is not loaded enough for there to be any rational
>> resource-exhaustion excuse for that.

> Linux expands stack space only on demand, thus it's possible to run out
> of stack space while there ought to be stack space. Unfortunately that
> during a stack expansion, which means there's no easy place to report
> that. I've seen this be hit in production on busy machines.

As I said, this machine doesn't seem busy enough for that to be a
tenable excuse; there's nobody but me logged in, and the buildfarm
critter isn't running.

> I wonder if the machine is configured with overcommit_memory=2,
> i.e. don't overcommit. cat /proc/sys/vm/overcommit_memory would tell.

$ cat /proc/sys/vm/overcommit_memory
0

> What does grep -E '^(Mem|Commit)' /proc/meminfo show while it's
> happening?

idle:

$ grep -E '^(Mem|Commit)' /proc/meminfo
MemTotal: 2074816 kB
MemFree: 36864 kB
MemAvailable: 1779584 kB
CommitLimit: 1037376 kB
Committed_AS: 412480 kB

a few captures while regression tests are running:

$ grep -E '^(Mem|Commit)' /proc/meminfo
MemTotal: 2074816 kB
MemFree: 8512 kB
MemAvailable: 1819264 kB
CommitLimit: 1037376 kB
Committed_AS: 371904 kB
$ grep -E '^(Mem|Commit)' /proc/meminfo
MemTotal: 2074816 kB
MemFree: 32640 kB
MemAvailable: 1753792 kB
CommitLimit: 1037376 kB
Committed_AS: 585984 kB
$ grep -E '^(Mem|Commit)' /proc/meminfo
MemTotal: 2074816 kB
MemFree: 56640 kB
MemAvailable: 1695744 kB
CommitLimit: 1037376 kB
Committed_AS: 568768 kB

> What does the signal information say? You can see it with
> p $_siginfo
> after receiving the signal. A SIGSEGV here, I assume.

(gdb) p $_siginfo
$1 = {si_signo = 11, si_errno = 0, si_code = 128, _sifields = {_pad = {0 <repeats 28 times>}, _kill = {si_pid = 0, si_uid = 0},
_timer = {si_tid = 0, si_overrun = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _rt = {si_pid = 0, si_uid = 0, si_sigval = {
sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid = 0, si_uid = 0, si_status = 0, si_utime = 0, si_stime = 0}, _sigfault = {
si_addr = 0x0}, _sigpoll = {si_band = 0, si_fd = 0}}}

> Yea, that seems like it might be good. But we have to be careful too, as
> there's some thing were do want to be interruptable from within a signal
> handler. We start some processes from within one after all...

The proposed patch has zero effect on what the signal mask will be inside
a signal handler, only on the transient state during handler entry/exit.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chapman Flack 2019-10-11 20:41:20 Re: let's make the list of reportable GUCs configurable (was Re: Add %r substitution for psql prompts to show recovery status)
Previous Message Andres Freund 2019-10-11 20:31:41 Re: stress test for parallel workers