Re: 9.4 beta1 crash on Debian sid/i386

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christoph Berg <cb(at)df7cb(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.4 beta1 crash on Debian sid/i386
Date: 2014-05-18 04:00:11
Message-ID: 9058.1400385611@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Christoph Berg <cb(at)df7cb(dot)de> writes:
> Re: Tom Lane 2014-05-14 <1357(dot)1400028161(at)sss(dot)pgh(dot)pa(dot)us>
>> It would appear that something is wrong with check_stack_depth(),
>> and/or getrlimit(RLIMIT_STACK) is lying to us about the available stack.

> ulimit -s is 8192 (kB); max_stack_depth is 2MB.

> check_stack_depth looks right, max_stack_depth_bytes there is 2097152
> and I can see stack_base_ptr - &stack_top_loc grow over repeated
> invocations of the function (stack_depth itself is optimized out).
> Still, it never enters "if (stack_depth > max_stack_depth_bytes...)".

Hm. Did you check that stack_base_ptr is non-NULL? If it were somehow
not getting set, that would disable the error report. But on most
architectures that would also result in silly values for the pointer
difference, so I doubt this is the issue.

> Interestingly, the Debian buildd managed to run the testsuite for
> i386, while I could reproduce the problem on the pgapt build machine
> and on my notebook, so there must be some system difference. Possibly
> the reason is these two machines are running a 64bit kernel and I'm
> building in a 32bit chroot, though that hasn't been a problem before.

I'm suspicious that something has changed in your build environment,
because that stack-checking logic hasn't changed since these commits:

Author: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
Branch: master Release: REL9_2_BR [ef3883d13] 2012-04-08 19:07:55 +0300
Branch: REL9_1_STABLE Release: REL9_1_4 [ef29bb1f7] 2012-04-08 19:08:13 +0300
Branch: REL9_0_STABLE Release: REL9_0_8 [77dc2b0a4] 2012-04-08 19:09:12 +0300
Branch: REL8_4_STABLE Release: REL8_4_12 [89da5dc6d] 2012-04-08 19:09:26 +0300
Branch: REL8_3_STABLE Release: REL8_3_19 [ddeac5dec] 2012-04-08 19:09:37 +0300

Do stack-depth checking in all postmaster children.

We used to only initialize the stack base pointer when starting up a regular
backend, not in other processes. In particular, autovacuum workers can run
arbitrary user code, and without stack-depth checking, infinite recursion
in e.g an index expression will bring down the whole cluster.

The lack of reports from the buildfarm or other users is also evidence
against there being a widespread issue here.

A different thought: I have heard of environments in which the available
stack depth is much less than what ulimit would suggest because the ulimit
space gets split up for multiple per-thread stacks. That should not be
happening in a Postgres backend, since we don't do threading, but I'm
running out of ideas to investigate ...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-05-18 04:28:26 Re: uuid-ossp (Re: [pgsql-packagers] Postgresapp 9.4 beta build ready)
Previous Message Jeff Janes 2014-05-18 03:30:11 Re: 9.4 checksum error in recovery with btree index