Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>
Subject: Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge
Date: 2011-09-03 14:35:41
Message-ID: 201109031435.p83EZfm18481@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers


Did you test with a lower optimization level? Gentoo is notorious for
buggy/bleeding edge tools.

---------------------------------------------------------------------------

Jeremy Drake wrote:
> On Fri, 2 Sep 2011, Tom Lane wrote:
>
> > Yeah, so the next question would be why those other ones aren't showing
> > problems. But at least now we have a potential mechanism for getting
> > from "the include list changed" to "cube is crashing on an offsetof",
> > namely that something is affecting the expansion of the offsetof macro.
> > Up to now it's been black magic, and I don't like patching around
> > problems we don't understand any better than that.
>
>
> I'm going to try answering about 3 emails at once here, so I apologize for
> the confusion.
>
> Checking preprocessor output on cube_f8_f8 between working and broken, the
> output was identical for that function.
>
> Changing offsetof(NDBOX, x[0]) to offsetof(NDBOX, x), no change in
> behavior.
>
> Then I started investigating the disassembly. The PIC disassembly was
> making my head hurt (why is the compiler calling the next instruction,
> popping the return address into a register, and using it as a pointer?
> Oh yeah, PIC, duh!), but I'm pretty sure that what it crashed on was
> attempting to access the global external variable CurrentMemoryContext.
> The odd thing is, that the disassembly code between the working and
> non-working was the same, except for the offsets.
>
> Here's the disassembly output from the core dump:
> Program terminated with signal 11, Segmentation fault.
> #0 cube_f8_f8 () at cube.c:1435
> 1435 size = offsetof(NDBOX, x) +sizeof(double) * 2;
> (gdb) set disassembly-flavor intel
> (gdb) disass
> Dump of assembler code for function cube_f8_f8:
> 0xb776ab50 <+0>: push ebp
> 0xb776ab51 <+1>: mov ebp,esp
> 0xb776ab53 <+3>: and esp,0xfffffff8
> 0xb776ab56 <+6>: push ebx
> 0xb776ab57 <+7>: sub esp,0x14
> 0xb776ab5a <+10>: mov ecx,DWORD PTR [ebp+0x8]
> 0xb776ab5d <+13>: mov DWORD PTR [esp+0x10],ebx
> 0xb776ab61 <+17>: mov edx,DWORD PTR [ecx+0x14]
> 0xb776ab64 <+20>: movsd xmm0,QWORD PTR [edx]
> 0xb776ab68 <+24>: mov edx,DWORD PTR [ecx+0x18]
> 0xb776ab6b <+27>: movsd xmm1,QWORD PTR [edx]
> 0xb776ab6f <+31>: movsd QWORD PTR [esp],xmm0
> 0xb776ab74 <+36>: movsd QWORD PTR [esp+0x8],xmm1
> 0xb776ab7a <+42>: push 0x18
> 0xb776ab7c <+44>: call 0xb776ab81 <cube_f8_f8+49>
> 0xb776ab81 <+49>: pop eax
> 0xb776ab82 <+50>: add eax,0x9472
> 0xb776ab87 <+55>: mov edx,DWORD PTR [eax-0x58]
> => 0xb776ab8d <+61>: push DWORD PTR [edx]
> 0xb776ab8f <+63>: mov DWORD PTR [esp+0x18],ebx
> 0xb776ab93 <+67>: mov ebx,eax
> 0xb776ab95 <+69>: call 0xb776745c <MemoryContextAllocZero(at)plt>
>
>
>
> With this knowledge, I thought maybe cube wasn't the actual problem, but
> just happened to be early in the list of contrib modules being tested. So
> I arbitrarily picked another contrib module to test, intarray.
>
> ============== running regression test queries ==============
> test _int ... FAILED (test process exited with exit
> code 2)
>
> Program terminated with signal 11, Segmentation fault.
> #0 0xb785a1fa in g_intbig_union ()
> from
> /buildfarm/test/pgsql_test/contrib/intarray/tmp_check/install/usr/local/pgsql/lib/_int.so
> (gdb) set disassembly-flavor intel
> (gdb) disass
> Dump of assembler code for function g_intbig_union:
> <SNIP UNRELATED ASSEMBLY CODE>
> 0xb785a087 <+31>: call 0xb785a08c <g_intbig_union+36>
> 0xb785a08c <+36>: pop eax
> 0xb785a08d <+37>: add eax,0x6f67
> 0xb785a092 <+42>: mov DWORD PTR [esp+0x104],eax
> <SNIP MORE UNRELATED ASSEMBLY CODE (this is a much longer function)>
> 0xb785a1e5 <+381>: mov eax,DWORD PTR [esp+0x104]
> 0xb785a1ec <+388>: mov esi,DWORD PTR [eax-0x24]
> 0xb785a1f2 <+394>: mov DWORD PTR [esp+0x108],ebx
> 0xb785a1f9 <+401>: push ebx
> => 0xb785a1fa <+402>: push DWORD PTR [esi]
> 0xb785a1fc <+404>: mov DWORD PTR [esp+0x104],ebx
> 0xb785a203 <+411>: mov ebx,eax
> 0xb785a205 <+413>: call 0xb7853ef0 <MemoryContextAlloc(at)plt>
>
>
> Looks pretty familiar, right?
>
> Still, I have no idea why adding an include would cause issues accessing
> CurrentMemoryContext. But at least we're not blaming offsetof or cube
> anymore...
>

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

In response to

Browse pgsql-committers by date

  From Date Subject
Next Message Bruce Momjian 2011-09-03 14:39:15 Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge
Previous Message Tom Lane 2011-09-03 14:23:09 Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2011-09-03 14:39:15 Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge
Previous Message Peter Eisentraut 2011-09-03 14:24:40 Re: pg_upgrade automatic testing