IA64 versus effective stack limit

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Cc: "Sergey E(dot) Koposov" <math(at)sai(dot)msu(dot)ru>
Subject: IA64 versus effective stack limit
Date: 2010-11-06 17:34:46
Message-ID: 21563.1289064886@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sergey was kind enough to lend me use of buildfarm member dugong
(IA64, Debian Etch) so I could poke into why its behavior in the
recursion-related regression tests was so odd. I had previously
tried and failed to reproduce the behavior on a Red Hat IA64 test
machine (running RHEL of course) so I was feeling a bit baffled.
Here's what I found out:

1. Debian Etch has the make-resets-the-stack-rlimit bug that I reported
about yesterday, whereas the RHEL version I was testing had the fix
for that. So that's why I couldn't reproduce max_stack_depth getting
set to 100kB.

2. IA64 is a very weird architecture: it has two separate hardware
stacks. One is reserved for saving registers, which IA64 has got
a lot of, and the other "normal" stack holds everything else.
The method we use in check_stack_depth (ie, measure the difference
in addresses of local variables) effectively measures the depth of
the normal stack. I don't know of any simple way to find out the
depth of the register stack. You can get gdb to tell you about
both stacks, though. I found out that with PG HEAD, the recursion
distance for the "infinite_recurse()" regression test is 160 bytes
of normal stack and 928 bytes of register stack per fmgr_sql call
level. This is with gcc (I got identical numbers on dugong and the
RHEL machine). But, if you build PG with icc as the buildfarm
critter is doing, that bloats to 3232 bytes of normal stack and
2832 bytes of register stack. For comparison, my x86_64 Fedora 13 box
uses 704 bytes of stack per recursion level.

I don't know why icc is so much worse than gcc on this measure of
stack depth consumption, but clearly the combination of that and
the 100kB max_stack_depth explains why dugong is failing to do
very many levels of recursion before erroring out. Fixing
get_stack_depth_rlimit as I proposed yesterday should give it
a reasonable stack depth.

However, we're not out of the woods yet. Because check_stack_depth is
only checking the normal stack depth, and the two stacks don't grow at
the same rate, it's possible for a crash to occur due to running out of
register stack space. We haven't seen that happen on dugong because,
as shown above, with icc the register stack grows more slowly than the
normal stack (at least for the specific functions we care about here).
But with gcc, the same code eats register stack a lot faster than normal
stack --- and in fact I observed a crash in the infinite_recurse() test
when building with gcc and testing in a manually-started postmaster.
The manually-started postmaster was under ulimit -s 8MB, which
apparently Debian interprets as "8MB for normal stack and another 8MB
for register stack". Even though check_stack_depth was trying to
constrain the normal stack to just 2MB, the register stack grew 5.8
times faster and so blew through 8MB before check_stack_depth thought
there was a problem. Raising ulimit -s allowed it to work.

(Curiously, I did *not* see the same type of crash on the RHEL machine.
I surmise that Red Hat has tweaked the kernel to allow the register
stack to grow more than the normal stack, but I haven't tried to verify
that.)

So this means we have a problem. To some extent it's new in HEAD:
before the changes I made last week to not keep a local
FunctionCallInfoData in ExecMakeFunctionResult, there would have been at
least another 900 bytes of normal stack per recursion level, so even
with gcc the register stack would grow slower than normal stack in this
test, and you wouldn't have seen any crash in the regression tests.
But I'm sure there are lots of other potentially recursive routines in
PG where register stack could grow faster than normal stack, so we
shouldn't suppose that this fmgr_sql recursion is the only trouble spot.

As I said above, I don't know of any good way to measure register stack
depth directly. It's probably possible to find out by asking the kernel
or something like that, but we surely do not want to introduce a kernel
call into check_stack_depth(). So a good solution for this is hard to
see. The best idea I have at the moment is to reduce the reported stack
limit by some arbitrary factor, ie do something like

#ifdef __IA64__
val /= 8;
#endif

in get_stack_depth_rlimit(). Anyone have a better idea?

BTW, this also suggests to me that it'd be a real good idea to have
a buildfarm critter for IA64+gcc --- the differences between gcc and
icc are clearly pretty significant on this hardware.

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-11-06 17:43:15 Re: temporary functions (and other object types)
Previous Message David Fetter 2010-11-06 17:17:05 Re: Query Plan Columns