Backend Core Dumping

From: Kristofer Munn <kmunn(at)munn(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Backend Core Dumping
Date: 2000-01-28 02:28:05
Message-ID: Pine.LNX.4.04.10001272118430.1275-100000@munn.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi. I have a reproducible crash on the backend of my system
(6.5.3/Intel) when selecting a particular tuple from a particular table.
My guess is something went wrong in the tuple itself and now the code
below is failing. I get between 6 and 20 spinlock crashes on my box each
day that look like this in /var/log/messages:

Jan 23 08:02:46 mymailman logger: FATAL: s_lock(401dcc34) at
bufmgr.c:1106, stuck spinlock. Aborting.

This isn't the first time I've had this issue. One solution is to delete
the affected tuple and vacuum. A vacuum spins indeterminably and the file
access dates don't change in the database directory.

I brought it into gdb and here's the result:

(gdb) bt
#0 0x8062354 in nocachegetattr (tuple=0x823278c, attnum=1,
tupleDesc=0x81d70b8, isnull=0xbfffa3c7 "")
at heaptuple.c:492
#1 0x8092cf2 in ExecEvalVar (variable=0x8230968, econtext=0x8232730,
isNull=0xbfffa3c7 "") at execQual.c:295
#2 0x8093810 in ExecEvalExpr (expression=0x8230968, econtext=0x8232730,
isNull=0xbfffa3c7 "",
isDone=0xbfffa48b "\001\234\210\t\bp\035#\b\207\t\b&\t\bp\035#\b") at
execQual.c:1208
#3 0x8093a3d in ExecTargetList (targetlist=0x8231a98, nodomains=14,
targettype=0x8232800, values=0x8233a98,
econtext=0x8232730, isDone=0xbfffa48b
"\001\234\210\t\bp\035#\b\207\t\b&\t\bp\035#\b")
at execQual.c:1495
#4 0x8093bed in ExecProject (projInfo=0x8231cc0,
isDone=0xbfffa48b "\001\234\210\t\bp\035#\b\207\t\b&\t\bp\035#\b") at
execQual.c:1621
#5 0x8093ca1 in ExecScan (node=0x8231d70, accessMtd=0x80987a0 <SeqNext>)
at execScan.c:153
#6 0x80988bb in ExecSeqScan (node=0x8231d70) at nodeSeqscan.c:159
#7 0x80926c6 in ExecProcNode (node=0x8231d70, parent=0x8231d70) at
execProcnode.c:262
#8 0x8091a79 in ExecutePlan (estate=0x8232080, plan=0x8231d70,
operation=CMD_SELECT, offsetTuples=0,
numberTuples=0, direction=ForwardScanDirection, destfunc=0x8233df0) at
execMain.c:908
#9 0x809117e in ExecutorRun (queryDesc=0x8232068, estate=0x8232080,
feature=3, limoffset=0x0, limcount=0x0)
at execMain.c:339
#10 0x80df768 in ProcessQueryDesc (queryDesc=0x8232068, limoffset=0x0,
limcount=0x0) at pquery.c:333
#11 0x80df7ce in ProcessQuery (parsetree=0x820c5b8, plan=0x8231d70,
dest=Remote) at pquery.c:376
#12 0x80de228 in pg_exec_query_dest (query_string=0xbfffa634 "select *
from tblarticle;", dest=Remote,
aclOverride=0) at postgres.c:768
#13 0x80de107 in pg_exec_query (query_string=0xbfffa634 "select * from
tblarticle;") at postgres.c:656
#14 0x80df18c in PostgresMain (argc=6, argv=0xbffff7b0, real_argc=5,
real_argv=0xbffffd14) at postgres.c:1647
#15 0x80c92ea in DoBackend (port=0x81c6c10) at postmaster.c:1628
#16 0x80c8e0a in BackendStartup (port=0x81c6c10) at postmaster.c:1373
#17 0x80c8559 in ServerLoop () at postmaster.c:823
#18 0x80c8097 in PostmasterMain (argc=5, argv=0xbffffd14) at
postmaster.c:616
#19 0x80a0666 in main (argc=5, argv=0xbffffd14) at main.c:97
#20 0x400facb3 in __libc_start_main (main=0x80a0600 <main>, argc=5,
argv=0xbffffd14, init=0x8061374 <_init>,
fini=0x810d8ec <_fini>, rtld_fini=0x4000a350 <_dl_fini>,
stack_end=0xbffffd0c)
at ../sysdeps/generic/libc-start.c:78
(gdb) print *att[j]
$7 = {attrelid = 23320, attname = {data = "sstatus", '\000' <repeats 24
times>, alignmentDummy = 1635021683},
atttypid = 1042, attdisbursion = 0.999828398, attlen = -1, attnum = 14,
attnelems = 0,
attcacheoff = 1836085160, atttypmod = 5, attbyval = 0 '\000', attisset =
0 '\000', attalign = 105 'i',
attnotnull = 0 '\000', atthasdef = 1 '\001'}
(gdb) print tp + off
$1 = 0xad93f9dc <Address 0xad93f9dc out of bounds>
(gdb) print tp
$2 = 0x40238a34 "@A;"
(gdb) print off
$3 = 1836085160

Anything else I should ask it to provide? I will save the core to answer
any questions. Is there any fix for these killer spinlocks that take down
all my backends at the same time?

- K

Kristofer Munn * KMI * 973-509-9414 * AIM KrMunn * http://www.munn.com/

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff MacDonald <jeff@pgsql.com> 2000-01-28 03:07:59 Re: [HACKERS] Spinlock error
Previous Message Assaf Arkin 2000-01-28 01:45:23 TID clarification