Skip site navigation (1) Skip section navigation (2)

Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge

From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>
Subject: Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge
Date: 2011-09-03 05:54:33
Message-ID: alpine.BSO.2.00.1109022239230.27326@resin.csoft.net (view raw or flat)
Thread:
Lists: pgsql-committerspgsql-hackers
On Fri, 2 Sep 2011, Tom Lane wrote:

> Yeah, so the next question would be why those other ones aren't showing
> problems.  But at least now we have a potential mechanism for getting
> from "the include list changed" to "cube is crashing on an offsetof",
> namely that something is affecting the expansion of the offsetof macro.
> Up to now it's been black magic, and I don't like patching around
> problems we don't understand any better than that.


I'm going to try answering about 3 emails at once here, so I apologize for
the confusion.

Checking preprocessor output on cube_f8_f8 between working and broken, the
output was identical for that function.

Changing offsetof(NDBOX, x[0]) to offsetof(NDBOX, x), no change in
behavior.

Then I started investigating the disassembly.  The PIC disassembly was
making my head hurt (why is the compiler calling the next instruction,
popping the return address into a register, and using it as a pointer?
Oh yeah, PIC, duh!), but I'm pretty sure that what it crashed on was
attempting to access the global external variable CurrentMemoryContext.
The odd thing is, that the disassembly code between the working and
non-working was the same, except for the offsets.

Here's the disassembly output from the core dump:
Program terminated with signal 11, Segmentation fault.
#0  cube_f8_f8 () at cube.c:1435
1435		size = offsetof(NDBOX, x) +sizeof(double) * 2;
(gdb) set disassembly-flavor intel
(gdb) disass
Dump of assembler code for function cube_f8_f8:
   0xb776ab50 <+0>:	push   ebp
   0xb776ab51 <+1>:	mov    ebp,esp
   0xb776ab53 <+3>:	and    esp,0xfffffff8
   0xb776ab56 <+6>:	push   ebx
   0xb776ab57 <+7>:	sub    esp,0x14
   0xb776ab5a <+10>:	mov    ecx,DWORD PTR [ebp+0x8]
   0xb776ab5d <+13>:	mov    DWORD PTR [esp+0x10],ebx
   0xb776ab61 <+17>:	mov    edx,DWORD PTR [ecx+0x14]
   0xb776ab64 <+20>:	movsd  xmm0,QWORD PTR [edx]
   0xb776ab68 <+24>:	mov    edx,DWORD PTR [ecx+0x18]
   0xb776ab6b <+27>:	movsd  xmm1,QWORD PTR [edx]
   0xb776ab6f <+31>:	movsd  QWORD PTR [esp],xmm0
   0xb776ab74 <+36>:	movsd  QWORD PTR [esp+0x8],xmm1
   0xb776ab7a <+42>:	push   0x18
   0xb776ab7c <+44>:	call   0xb776ab81 <cube_f8_f8+49>
   0xb776ab81 <+49>:	pop    eax
   0xb776ab82 <+50>:	add    eax,0x9472
   0xb776ab87 <+55>:	mov    edx,DWORD PTR [eax-0x58]
=> 0xb776ab8d <+61>:	push   DWORD PTR [edx]
   0xb776ab8f <+63>:	mov    DWORD PTR [esp+0x18],ebx
   0xb776ab93 <+67>:	mov    ebx,eax
   0xb776ab95 <+69>:	call   0xb776745c <MemoryContextAllocZero(at)plt>



With this knowledge, I thought maybe cube wasn't the actual problem, but
just happened to be early in the list of contrib modules being tested.  So
I arbitrarily picked another contrib module to test, intarray.

============== running regression test queries        ==============
test _int                     ... FAILED (test process exited with exit
code 2)

Program terminated with signal 11, Segmentation fault.
#0  0xb785a1fa in g_intbig_union ()
   from
/buildfarm/test/pgsql_test/contrib/intarray/tmp_check/install/usr/local/pgsql/lib/_int.so
(gdb) set disassembly-flavor intel
(gdb) disass
Dump of assembler code for function g_intbig_union:
<SNIP UNRELATED ASSEMBLY CODE>
   0xb785a087 <+31>:	call   0xb785a08c <g_intbig_union+36>
   0xb785a08c <+36>:	pop    eax
   0xb785a08d <+37>:	add    eax,0x6f67
   0xb785a092 <+42>:	mov    DWORD PTR [esp+0x104],eax
<SNIP MORE UNRELATED ASSEMBLY CODE (this is a much longer function)>
   0xb785a1e5 <+381>:	mov    eax,DWORD PTR [esp+0x104]
   0xb785a1ec <+388>:	mov    esi,DWORD PTR [eax-0x24]
   0xb785a1f2 <+394>:	mov    DWORD PTR [esp+0x108],ebx
   0xb785a1f9 <+401>:	push   ebx
=> 0xb785a1fa <+402>:	push   DWORD PTR [esi]
   0xb785a1fc <+404>:	mov    DWORD PTR [esp+0x104],ebx
   0xb785a203 <+411>:	mov    ebx,eax
   0xb785a205 <+413>:	call   0xb7853ef0 <MemoryContextAlloc(at)plt>


Looks pretty familiar, right?

Still, I have no idea why adding an include would cause issues accessing
CurrentMemoryContext.  But at least we're not blaming offsetof or cube
anymore...



In response to

Responses

pgsql-hackers by date

Next:From: Greg SmithDate: 2011-09-03 06:19:14
Subject: Re: PATCH: regular logging of checkpoint progress
Previous:From: Andrew DunstanDate: 2011-09-03 00:20:19
Subject: Re: pg_upgrade automatic testing

pgsql-committers by date

Next:From: Tom LaneDate: 2011-09-03 14:23:09
Subject: Re: pgsql: Remove "fmgr.h" include in cube contrib --- caused crash on a Ge
Previous:From: Peter EisentrautDate: 2011-09-02 22:30:14
Subject: pgsql: Whitespace adjustment for consistency in the file

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group