Re: Why is infinite_recurse test suddenly failing?

From: Mark Wong <mark(at)2ndQuadrant(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Why is infinite_recurse test suddenly failing?
Date: 2019-05-14 15:31:37
Message-ID: 20190514153137.GC10216@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 10, 2019 at 05:26:43PM -0400, Andrew Dunstan wrote:
>
> On 5/10/19 3:35 PM, Tom Lane wrote:
> > Andres Freund <andres(at)anarazel(dot)de> writes:
> >> On 2019-05-10 11:38:57 -0400, Tom Lane wrote:
> >>> I am wondering if, somehow, the stack depth limit seen by the postmaster
> >>> sometimes doesn't apply to its children. That would be pretty wacko
> >>> kernel behavior, especially if it's only intermittently true.
> >>> But we're running out of other explanations.
> >> I wonder if this is a SIGSEGV that actually signals an OOM
> >> situation. Linux, if it can't actually extend the stack on-demand due to
> >> OOM, sends a SIGSEGV. The signal has that information, but
> >> unfortunately the buildfarm code doesn't print it. p $_siginfo would
> >> show us some of that...
> >> Mark, how tight is the memory on that machine? Does dmesg have any other
> >> information (often segfaults are logged by the kernel with the code
> >> IIRC).
> > It does sort of smell like a resource exhaustion problem, especially
> > if all these buildfarm animals are VMs running on the same underlying
> > platform. But why would that manifest as "you can't have a measly two
> > megabytes of stack" and not as any other sort of OOM symptom?
> >
> > Mark, if you don't mind modding your local copies of the buildfarm
> > script, I think what Andres is asking for is a pretty trivial addition
> > in PGBuild/Utils.pm's sub get_stack_trace:
> >
> > my $cmdfile = "./gdbcmd";
> > my $handle;
> > open($handle, '>', $cmdfile) || die "opening $cmdfile: $!";
> > print $handle "bt\n";
> > + print $handle "p $_siginfo\n";
> > close($handle);
> >
> >
>
>
> I think we'll need to write that as:
>
>
>     print $handle 'p $_siginfo',"\n";

Ok, I have this added to everyone now.

I think I also have caught up on this thread, but let me know if I
missed anything.

Regards,
Mark

--
Mark Wong
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-05-14 15:32:52 Re: Inconsistent error message wording for REINDEX CONCURRENTLY
Previous Message Mark Wong 2019-05-14 15:12:07 Re: Why is infinite_recurse test suddenly failing?