Re: PosgreSQL is crashing with a signal 11 - Bug?

From: Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: PosgreSQL is crashing with a signal 11 - Bug?
Date: 2004-09-07 21:20:44
Message-ID: 1094592043.5232.38.camel@linux.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, 2004-09-07 at 19:58, Tom Lane wrote:

> Rafael Martinez Guerrero <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no> writes:
> > * Information from CORE dump we got with --enable-debug. We have
> > compiled a new version of postgres and run it through gdb with the core
> > dump we had/got from postgres without --enable-debug.=20
>
> Okay, theoretically that works, but it might be smarter to install the
> debug build and get a fresh core dump that definitely corresponds to it.
>

It is late in Norway and we need to sleep, we will try this tomorrow
morning.

> > #0 0xb734d07c in memcpy () from /lib/tls/libc.so.6
>
> > #1 0x0806bba8 in DataFill (data=3D0xb7489000 <Address 0xb7489000 out of
> > bounds>, tupleDesc=3D0x82fd554, value=3D0x82fd550, nulls=3D0xbfff7ec0 " n =
> > ",
> > infomask=3D0x836e904c, bit=3D0x836e904f "\003\f") at heaptuple.c:139
>
> If accurate, that says it's crashing here:
>
> /* fixed-length pass-by-reference */
> Assert(att[i]->attlen > 0);
> data_length = att[i]->attlen;
> --> memcpy(data, DatumGetPointer(value[i]), data_length);
>
> which suggests either that att[i]->attlen is corrupt, or that the
> computed length for the preceding column was wacko (leading to the
> data pointer being moved to a silly address), or that the provided
> value[i] is wrong. In the context at hand none of these seem especially
> likely, but one of them must be the case. Can you look with jdb to
> see what the value of i is, and print out the contents of the *(att[i])
> struct? Also look at "data" and "value[i]" to see if they are sensible
> pointers or not.
>

I got this from one of our developers (from the core dump generated by
7.3.7 without --enable-debug):
--------------------------------------
(gdb) inspect i
$1 = 1

(gdb) inspect att[i]
$2 = 0x82fd6e8

(gdb) inspect *att[i]
$3 = {attrelid = 0, attname = {data = '\0' <repeats 63 times>,
alignmentDummy = 0}, atttypid = 1700, attstattarget = -1, attlen = -1,
attnum = 2, attndims = 0, attcacheoff = -1, atttypmod = 393220, attbyval
= 0 '\0', attstorage = 109 'm', attisset = 0 '\0', attalign = 105 'i',
attnotnull = 0 '\0', atthasdef = 0 '\0', attisdropped = 0 '\0',
attislocal = 1 '\001', attinhcount = 0}

(gdb) inspect data
$4 = 0xb7489000 <Address 0xb7489000 out of bounds>

(gdb) inspect value[i]
$5 = 3054556648

> How reproducible is the crash --- does it happen every time you execute
> this particular FETCH?
>

We are not sure about this. We did not log as much as we should in the
beginning. One thing is sure, the last time, it happens after this
FETCH. We have full logging on now and we will be able to know more
about this if/when it crash again.

> regards, tom lane

Thanks for your help. I hope you/we will be able to find out this, right
now is a big crisis for us.

--
Rafael Martinez, <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
Center for Information Technology Services
University of Oslo, Norway

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2004-09-07 21:36:02 Re: PosgreSQL is crashing with a signal 11 - Bug?
Previous Message Tom Lane 2004-09-07 17:58:25 Re: PosgreSQL is crashing with a signal 11 - Bug?