Still crashing with latest 7.0.2 (Re: (forw) more crashes)

From: Alfred Perlstein <bright(at)wintelcom(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Still crashing with latest 7.0.2 (Re: (forw) more crashes)
Date: 2000-10-08 10:48:34
Message-ID: 20001008034834.C272@fw.wintelcom.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Alfred Perlstein <bright(at)wintelcom(dot)net> [001006 16:02] wrote:
> * Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> [001004 09:56] wrote:
> > Alfred Perlstein <bright(at)wintelcom(dot)net> writes:
> > > I have a reliable way to make postgresql crash after a
> > > couple of hours over here and a backtrace that looks like a good
> > > catch.
> >
> > I'm interested in pursuing this, but the backtrace doesn't give enough
> > info to debug it. It looks like the backend is crashing because of
> > a previously-corrupted tuple, so what we'll need to do is work backwards
> > to find where the data corruption is occurring.
> >
> > Can you boil down the test sequence to something that could be
> > reproduced by other people? The most convenient way to work on it
> > would be to see it happen here...
>
> I just wanted to note on the list that these crashes seem to have
> stopped with the latest 7.0.2-patches (as of 11:30ish PM EST Oct,
> 4th), it's been over 24 hours since the upgrade (previously I
> couldn't go for more than 20 without a crash).
>
> My only concern is that I didn't notice anything on the cvs list
> that referenced a fix for crashes.
>
> Well anyhow I'll post an update in a couple of days if all is well
> or not.

Unfortunatly I'm still getting crashes, this one looks like it's
during a vacuum, previously I got a crash while doing an UPDATE, but
in exactly the same spot, it took quite a bit longer to provoke this
time:

-rw------- 1 pgsql pgsql 277561344 Oct 8 02:56 postgres.core

#0 0x8063c8b in nocachegetattr (tuple=0xbfbfe974, attnum=3,
tupleDesc=0x84ca368, isnull=0xbfbfe7fb "") at heaptuple.c:537
537 off = att_addlength(off, att[i]->attlen, tp + off);
(gdb) bt
#0 0x8063c8b in nocachegetattr (tuple=0xbfbfe974, attnum=3,
tupleDesc=0x84ca368, isnull=0xbfbfe7fb "") at heaptuple.c:537
#1 0x8075851 in GetIndexValue (tuple=0xbfbfe974, hTupDesc=0x84ca368,
attOff=3, attrNums=0x8508240, fInfo=0x0, attNull=0xbfbfe7fb "")
at indexam.c:445
#2 0x80903be in FormIndexDatum (numberOfAttributes=4,
attributeNumber=0x8508240, heapTuple=0xbfbfe974, heapDescriptor=0x84ca368,
datum=0x8508018, nullv=0x84ba170 " ", fInfo=0x0) at index.c:1256
#3 0x80a05e6 in vc_repair_frag (vacrelstats=0x84ba290, onerel=0x84c6788,
vacuum_pages=0xbfbfea1c, fraged_pages=0xbfbfea0c, nindices=1,
Irel=0x84ba118) at vacuum.c:1634
#4 0x809e3b9 in vc_vacone (relid=1315147913, analyze=0, va_cols=0x0)
at vacuum.c:640
#5 0x809d9ac in vc_vacuum (VacRelP=0xbfbfeaac, analyze=0 '\000', va_cols=0x0)
at vacuum.c:299
#6 0x809d934 in vacuum (vacrel=0x84ba0e8 "\030", verbose=1, analyze=0 '\000',
va_spec=0x0) at vacuum.c:223
#7 0x810ca8c in ProcessUtility (parsetree=0x84ba110, dest=Remote)
at utility.c:694
#8 0x810a44e in pg_exec_query_dest (
query_string=0x81cd370 "VACUUM verbose webhit_details_formatted;",
dest=Remote, aclOverride=0) at postgres.c:617
#9 0x810a3a9 in pg_exec_query (
query_string=0x81cd370 "VACUUM verbose webhit_details_formatted;")
at postgres.c:562
#10 0x810b336 in PostgresMain (argc=7, argv=0xbfbff12c, real_argc=10,
real_argv=0xbfbffb8c) at postgres.c:1588
#11 0x80f0742 in DoBackend (port=0x8464000) at postmaster.c:2009
#12 0x80f02d5 in BackendStartup (port=0x8464000) at postmaster.c:1776
#13 0x80ef4f9 in ServerLoop () at postmaster.c:1037
#14 0x80eeede in PostmasterMain (argc=10, argv=0xbfbffb8c) at postmaster.c:725
#15 0x80bf3eb in main (argc=10, argv=0xbfbffb8c) at main.c:93
#16 0x8063495 in _start ()
st
532
533 if (usecache)
534 att[i]->attcacheoff = off;
535 }
536
537 off = att_addlength(off, att[i]->attlen, tp + off);
538
539 if (usecache &&
540 att[i]->attlen == -1 && !VARLENA_FIXED_SIZE(att[i]))
541 usecache = false;

it looks like it's dieing in the same place as the previous coredumps
however this looks like it's during a vacuum rather than an update:

(gdb) print off
$1 = -838833616
(gdb) print att[i]
$2 = 0x84ca640
(gdb) print *(att[i])
$3 = {attrelid = 1315147913, attname = {
data = "attr_name", '\000' <repeats 22 times>,
alignmentDummy = 1920234593}, atttypid = 1043, attdisbursion = 0,
attlen = -1, attnum = 3, attnelems = 0, attcacheoff = -1, atttypmod = 36,
attbyval = 0 '\000', attstorage = 112 'p', attisset = 0 '\000',
attalign = 105 'i', attnotnull = 0 '\000', atthasdef = 0 '\000'}
(gdb) print i
$4 = 2
(gdb) print tp
$5 = 0x5808eba5 "Yj"
(gdb) print tp+off
$6 = 0x260955d5 <Address 0x260955d5 out of bounds>

ack!

(gdb) print usecache
$7 = 0 '\000'
(gdb) print attnum
$8 = 3
(gdb) print slow
$9 = 139159376
(gdb) print *slow
$10 = 139241024
(gdb) print (char *) tup + tup->t_hoff
$11 = 0x5808eba5 "Yj"
(gdb) print tup
$12 = 0x5808eba0
(gdb) print *tup
$13 = {t_oid = 0, t_cmin = 6969654, t_cmax = 6958161, t_xmin = 1742,
t_xmax = 6955895, t_ctid = {ip_blkid = {bi_hi = 0, bi_lo = 639},
ip_posid = 84}, t_natts = 737, t_infomask = 32846, t_hoff = 5 '\005',
t_bits = "\000\002¥ "}
(gdb) print *tupleDesc
$14 = {natts = 1358981721, attrs = 0xce006a2c, constr = 0x77000006}
(gdb) print *(att[0])
$15 = {attrelid = 1315147913, attname = {
data = "counter_id", '\000' <repeats 21 times>,
alignmentDummy = 1853189987}, atttypid = 23, attdisbursion = 0,
attlen = 4, attnum = 1, attnelems = 0, attcacheoff = 0, atttypmod = -1,
attbyval = 1 '\001', attstorage = 112 'p', attisset = 0 '\000',
attalign = 105 'i', attnotnull = 0 '\000', atthasdef = 0 '\000'}
(gdb) print *(att[1])
$16 = {attrelid = 1315147913, attname = {
data = "attr_type", '\000' <repeats 22 times>,
alignmentDummy = 1920234593}, atttypid = 1043, attdisbursion = 0,
attlen = -1, attnum = 2, attnelems = 0, attcacheoff = 4, atttypmod = 36,
attbyval = 0 '\000', attstorage = 112 'p', attisset = 0 '\000',
attalign = 105 'i', attnotnull = 0 '\000', atthasdef = 0 '\000'}
(gdb) print *(att[2])
$17 = {attrelid = 1315147913, attname = {
data = "attr_name", '\000' <repeats 22 times>,
alignmentDummy = 1920234593}, atttypid = 1043, attdisbursion = 0,
attlen = -1, attnum = 3, attnelems = 0, attcacheoff = -1, atttypmod = 36,
attbyval = 0 '\000', attstorage = 112 'p', attisset = 0 '\000',
attalign = 105 'i', attnotnull = 0 '\000', atthasdef = 0 '\000'}
(gdb) print *(att[3])
$18 = {attrelid = 1315147913, attname = {
data = "attr_vers", '\000' <repeats 22 times>,
alignmentDummy = 1920234593}, atttypid = 1043, attdisbursion = 0,
attlen = -1, attnum = 4, attnelems = 0, attcacheoff = -1, atttypmod = 36,
attbyval = 0 '\000', attstorage = 112 'p', attisset = 0 '\000',
attalign = 105 'i', attnotnull = 0 '\000', atthasdef = 0 '\000'}
(gdb) print *(att[4])
$19 = {attrelid = 1315147913, attname = {
data = "attr_hits", '\000' <repeats 22 times>,
alignmentDummy = 1920234593}, atttypid = 20, attdisbursion = 0,
attlen = 8, attnum = 5, attnelems = 0, attcacheoff = -1, atttypmod = -1,
attbyval = 0 '\000', attstorage = 112 'p', attisset = 0 '\000',
attalign = 100 'd', attnotnull = 0 '\000', atthasdef = 1 '\001'}
(gdb) print *tuple
$20 = {t_len = 80, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 640},
ip_posid = 5}, t_datamcxt = 0x0, t_data = 0x5808eba0}

thanks,
--
-Alfred Perlstein - [bright(at)wintelcom(dot)net|alfred(at)freebsd(dot)org]
"I have the heart of a child; I keep it in a jar on my desk."

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chris 2000-10-08 10:54:21 Re: inheritance/oid questions
Previous Message Tatsuo Ishii 2000-10-08 09:31:36 Re: -S is missing in postgresql.conf?