Re: 9.4 beta1 crash on Debian sid/i386

From: Christoph Berg <christoph(dot)berg(at)credativ(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, bastian(dot)blank(at)credativ(dot)de
Subject: Re: 9.4 beta1 crash on Debian sid/i386
Date: 2014-05-19 14:47:17
Message-ID: 20140519144717.GG7296@msgid.df7cb.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Re: Andres Freund 2014-05-19 <20140519141221(dot)GC5098(at)alap3(dot)anarazel(dot)de>
> On 2014-05-19 09:53:11 -0400, Tom Lane wrote:
> > I think throwing an error out of a SIGBUS handler is right out. There
> > would be no way to know exactly what code we were interrupting. It's
> > the same reason we don't let, eg, the SIGALRM handler throw a timeout
> > error directly (in most places anyway).

Right. I just mentioned that for completeness.

> Agreed. I think if we really, really feel the need to do something about
> this - which I don't - we could allocate a separate stack very early on
> and use that.

Hmm, that'd be an extension of the other idea, "write something deep
in the stack on startup". This is probably less evil, though I agree
it's a big hammer for solving something that should probably be fixed
elsewhere.

> > >> * PostgreSQL allocates lots of heap using brk() instead of mmap()
> >
> > > It doesn't really do that, btw. It's the libc's mmap that makes those
> > > decisions, not postgres.
> >
> > It occurs to me that maybe this is a glibc bug, not a kernel bug?
>
> You think malloc() should try to be careful when calling brk() and check
> beforehand wether it'll conflict with stack_base + RLIMIT_STACK? That's
> not a bad argument, but it still seems a really bad choice to leave that
> little space for the heap. Especially when it's dependant on -pie being
> used.

It's probably both, the default ASLR layout providing too little heap,
plus malloc() running into the stack area - I'm not sure if the former
is the kernel's fault or libc/ld.so's, probably they need to work
together on that anyway.

Disabling -pie for all 32bit archs seems to be the way to go for us
now.

Does this topic warrant being mentioned in the docs?

Christoph

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-05-19 15:08:04 Re: buildfarm: strange OOM failures on markhor (running CLOBBER_CACHE_RECURSIVELY)
Previous Message Bruce Momjian 2014-05-19 14:23:43 Re: 9.4 release notes