Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>
Subject: Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge
Date: 2011-09-03 05:54:33
Message-ID: alpine.BSO.2.00.1109022239230.27326@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Fri, 2 Sep 2011, Tom Lane wrote:

> Yeah, so the next question would be why those other ones aren't showing
> problems. But at least now we have a potential mechanism for getting
> from "the include list changed" to "cube is crashing on an offsetof",
> namely that something is affecting the expansion of the offsetof macro.
> Up to now it's been black magic, and I don't like patching around
> problems we don't understand any better than that.

I'm going to try answering about 3 emails at once here, so I apologize for
the confusion.

Checking preprocessor output on cube_f8_f8 between working and broken, the
output was identical for that function.

Changing offsetof(NDBOX, x[0]) to offsetof(NDBOX, x), no change in
behavior.

Then I started investigating the disassembly. The PIC disassembly was
making my head hurt (why is the compiler calling the next instruction,
popping the return address into a register, and using it as a pointer?
Oh yeah, PIC, duh!), but I'm pretty sure that what it crashed on was
attempting to access the global external variable CurrentMemoryContext.
The odd thing is, that the disassembly code between the working and
non-working was the same, except for the offsets.

Here's the disassembly output from the core dump:
Program terminated with signal 11, Segmentation fault.
#0 cube_f8_f8 () at cube.c:1435
1435 size = offsetof(NDBOX, x) +sizeof(double) * 2;
(gdb) set disassembly-flavor intel
(gdb) disass
Dump of assembler code for function cube_f8_f8:
0xb776ab50 <+0>: push ebp
0xb776ab51 <+1>: mov ebp,esp
0xb776ab53 <+3>: and esp,0xfffffff8
0xb776ab56 <+6>: push ebx
0xb776ab57 <+7>: sub esp,0x14
0xb776ab5a <+10>: mov ecx,DWORD PTR [ebp+0x8]
0xb776ab5d <+13>: mov DWORD PTR [esp+0x10],ebx
0xb776ab61 <+17>: mov edx,DWORD PTR [ecx+0x14]
0xb776ab64 <+20>: movsd xmm0,QWORD PTR [edx]
0xb776ab68 <+24>: mov edx,DWORD PTR [ecx+0x18]
0xb776ab6b <+27>: movsd xmm1,QWORD PTR [edx]
0xb776ab6f <+31>: movsd QWORD PTR [esp],xmm0
0xb776ab74 <+36>: movsd QWORD PTR [esp+0x8],xmm1
0xb776ab7a <+42>: push 0x18
0xb776ab7c <+44>: call 0xb776ab81 <cube_f8_f8+49>
0xb776ab81 <+49>: pop eax
0xb776ab82 <+50>: add eax,0x9472
0xb776ab87 <+55>: mov edx,DWORD PTR [eax-0x58]
=> 0xb776ab8d <+61>: push DWORD PTR [edx]
0xb776ab8f <+63>: mov DWORD PTR [esp+0x18],ebx
0xb776ab93 <+67>: mov ebx,eax
0xb776ab95 <+69>: call 0xb776745c <MemoryContextAllocZero(at)plt>

With this knowledge, I thought maybe cube wasn't the actual problem, but
just happened to be early in the list of contrib modules being tested. So
I arbitrarily picked another contrib module to test, intarray.

============== running regression test queries ==============
test _int ... FAILED (test process exited with exit
code 2)

Program terminated with signal 11, Segmentation fault.
#0 0xb785a1fa in g_intbig_union ()
from
/buildfarm/test/pgsql_test/contrib/intarray/tmp_check/install/usr/local/pgsql/lib/_int.so
(gdb) set disassembly-flavor intel
(gdb) disass
Dump of assembler code for function g_intbig_union:
<SNIP UNRELATED ASSEMBLY CODE>
0xb785a087 <+31>: call 0xb785a08c <g_intbig_union+36>
0xb785a08c <+36>: pop eax
0xb785a08d <+37>: add eax,0x6f67
0xb785a092 <+42>: mov DWORD PTR [esp+0x104],eax
<SNIP MORE UNRELATED ASSEMBLY CODE (this is a much longer function)>
0xb785a1e5 <+381>: mov eax,DWORD PTR [esp+0x104]
0xb785a1ec <+388>: mov esi,DWORD PTR [eax-0x24]
0xb785a1f2 <+394>: mov DWORD PTR [esp+0x108],ebx
0xb785a1f9 <+401>: push ebx
=> 0xb785a1fa <+402>: push DWORD PTR [esi]
0xb785a1fc <+404>: mov DWORD PTR [esp+0x104],ebx
0xb785a203 <+411>: mov ebx,eax
0xb785a205 <+413>: call 0xb7853ef0 <MemoryContextAlloc(at)plt>

Looks pretty familiar, right?

Still, I have no idea why adding an include would cause issues accessing
CurrentMemoryContext. But at least we're not blaming offsetof or cube
anymore...

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2011-09-03 14:23:09 Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge
Previous Message Peter Eisentraut 2011-09-02 22:30:14 pgsql: Add archive_command example

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2011-09-03 06:19:14 Re: PATCH: regular logging of checkpoint progress
Previous Message Andrew Dunstan 2011-09-03 00:20:19 Re: pg_upgrade automatic testing