Re: sblock state on FreeBSD 6.1

From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: sblock state on FreeBSD 6.1
Date: 2006-05-11 17:09:56
Message-ID: 20060511170956.GI99570@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 10, 2006 at 05:23:29PM -0500, Jim C. Nasby wrote:
> On Thu, May 11, 2006 at 12:09:32AM +0200, Martijn van Oosterhout wrote:
> > > > This stack trace doesn't make any sense. ClosePostmasterPorts is not
> > > > calling PostgresMain. And pq_getbyte is not calling TouchSocketFile,
> > > > which in turn isn't calling secure_read.
> > >
> > > So I see... that's rather disturbing... any idea why gdb would end up
> > > that confused?
> >
> > Given you don't have debug enabled, it's likely all static symbols have
> > been dropped from the symbol table and gdb is guessing by listing the
> > function with the highest address before the actual function.
> >
> > You could try to find a consistant call tree where the functions call
> > eachother in the right order. Or just compile with debug symbols
> > enabled. It doesn't make a difference in performance or memory usage,
> > only diskspace (about 16MB on my machine, probably more on yours).
>
> Yeah, I was actually surprised when gdb spit out real function names and
> not just gibberish since I assumed that PostgreSQL wasn't compiled with
> debug symbols. I thought my assumption was wrong when that stuff came
> out, but I guess that was a bad guess...
>
> I'll recompile and try again... thanks!

Ok, got some good traces this time, with debug. Even better, by upping
the number of pgpool connections to 395, we were able to exactly
duplicate the problem we saw earlier in production (I know, who'd think
a dual opteron would have trouble servicing 400 simultaneous connections
at once? ;)

Unfortunately, I suspect some of these were grabbed after the process
had already moved past whatever was holding it in sblock.

Here's the traces that we captured...

Got 2 of these:
#0 0x000000080135bd2c in recvfrom () from /lib/libc.so.6
#1 0x00000000004f9898 in secure_read (port=0x834800, ptr=0x7cebe0,
len=8192) at be-secure.c:320
#2 0x00000000004fed7b in pq_recvbuf () at pqcomm.c:697
#3 0x00000000004fee27 in pq_getbyte () at pqcomm.c:738
#4 0x000000000055febf in PostgresMain (argc=4, argv=0x83b828,
username=0x83b740 "www") at postgres.c:289
#5 0x000000000053a487 in ServerLoop () at postmaster.c:2851
#6 0x000000000053bab7 in PostmasterMain (argc=3,
argv=0x7fffffffebb0) at postmaster.c:941
#7 0x0000000000500436 in main (argc=3, argv=0x7fffffffebb0) at
main.c:265

#0 0x000000080137638c in sendto () from /lib/libc.so.6
#1 0x0000000000535e67 in pgstat_report_tabstat () at pgstat.c:846
#2 0x000000000055ff75 in PostgresMain (argc=4, argv=0x83b828,
username=0x83b740 "www") at postgres.c:3162
#3 0x000000000053a487 in ServerLoop () at postmaster.c:2851
#4 0x000000000053bab7 in PostmasterMain (argc=3,
argv=0x7fffffffebb0) at postmaster.c:941
#5 0x0000000000500436 in main (argc=3, argv=0x7fffffffebb0) at
main.c:265

I suspect the rest of these probably happened after the sblock state had
cleared, but here they are anyway in case I'm wrong. Also, I removed the
'incriminating evidence' from the query strings; there wasn't anything
unusual about those queries, so I don't think it should matter.

#0 0x000000080138fcec in __syscall () from /lib/libc.so.6
#1 0x000000000054e67f in FileSeek (file=199, offset=0, whence=2) at
fd.c:1173
#2 0x000000000055aca3 in mdnblocks (reln=0x9f68c0) at md.c:972
#3 0x000000000055bb4a in smgrnblocks (reln=0x9f68c0) at smgr.c:571
#4 0x000000000052fbde in get_relation_info (relationObjectId=199,
rel=0xad3728) at plancat.c:206
#5 0x0000000000531041 in make_reloptinfo (root=0xac13e0, relid=6,
reloptkind=RELOPT_BASEREL) at relnode.c:146
#6 0x0000000000521ed3 in add_base_rels_to_query (root=0xac13e0,
jtnode=0xad2600) at initsplan.c:88
#7 0x0000000000521e93 in add_base_rels_to_query (root=0xac13e0,
jtnode=0xac9cd8) at initsplan.c:94
#8 0x0000000000521ed3 in add_base_rels_to_query (root=0xac13e0,
jtnode=0xad2710) at initsplan.c:88
#9 0x00000000005238cf in query_planner (root=0xac13e0,
tlist=0xace648, tuple_fraction=10, cheapest_path=0x7fffffffd308,
sorted_path=0x7fffffffd310, num_groups=0x7fffffffd318) at
planmain.c:141
#10 0x00000000005240cb in grouping_planner (root=0xac13e0,
tuple_fraction=10) at planner.c:796
#11 0x00000000005252c2 in subquery_planner (parse=0x9f0140,
tuple_fraction=0, subquery_pathkeys=0x0) at planner.c:374
#12 0x0000000000525423 in planner (parse=0x9f0140, isCursor=0 '\0',
cursorOptions=0, boundParams=0x0) at planner.c:134
#13 0x000000000055ddec in pg_plan_query (querytree=0x9f0140,
boundParams=0x0) at postgres.c:725
#14 0x000000000055de7c in pg_plan_queries (querytrees=0xc7,
boundParams=0x0, needSnapshot=0 '\0') at postgres.c:793
#15 0x000000000055e986 in exec_simple_query (
query_string=0x9f0250 "<snip>"...) at postgres.c:944
#16 0x000000000056031e in PostgresMain (argc=4, argv=0x83b828,
username=0x83b740 "www") at postgres.c:3217
#17 0x000000000053a487 in ServerLoop () at postmaster.c:2851
#18 0x000000000053bab7 in PostmasterMain (argc=3,
argv=0x7fffffffebb0) at postmaster.c:941
#19 0x0000000000500436 in main (argc=3, argv=0x7fffffffebb0) at
main.c:265

#0 hash_search (hashp=0x83b3f0, keyPtr=0x7fffffffd090,
action=HASH_FIND, foundPtr=0x0) at dynahash.c:585
#1 0x000000000054aa47 in BufTableLookup (tagPtr=0x7fffffffd09c) at
buf_table.c:82
#2 0x000000000054bbfd in ReadBuffer (reln=0xa24050, blockNum=7811)
at bufmgr.c:310
#3 0x000000000044a7c5 in heap_release_fetch (relation=0xa24050,
snapshot=0xa1f830, tuple=0xab7930, userbuf=0xab7950, keep_buf=1 '\001',
pgstat_info=0xab7960) at heapam.c:851
#4 0x000000000045043e in index_getnext (scan=0xab78e8,
direction=ForwardScanDirection) at indexam.c:547
#5 0x00000000004ef543 in IndexNext (node=0xab6530) at
nodeIndexscan.c:118
#6 0x00000000004e837f in ExecScan (node=0xab6530, accessMtd=0x4ef4c0
<IndexNext>) at execScan.c:103
#7 0x00000000004e38bd in ExecProcNode (node=0xab6530) at
execProcnode.c:325
#8 0x00000000004f13a1 in ExecNestLoop (node=0xa20880) at
nodeNestloop.c:135
#9 0x00000000004e386d in ExecProcNode (node=0xa20880) at
execProcnode.c:350
#10 0x00000000004f26d3 in ExecSort (node=0xa20770) at nodeSort.c:101
#11 0x00000000004e382a in ExecProcNode (node=0xa20770) at
execProcnode.c:369
#12 0x00000000004f2d43 in ExecLimit (node=0xa20550) at nodeLimit.c:84
#13 0x00000000004e37cd in ExecProcNode (node=0xa20550) at
execProcnode.c:393
#14 0x00000000004e2ad2 in ExecutorRun (queryDesc=0xa20550,
direction=ForwardScanDirection, count=0) at execMain.c:1110
#15 0x0000000000561e18 in PortalRunSelect (portal=0xa05030,
forward=-64 '?', count=0, dest=0xab2dd0) at pquery.c:794
#16 0x00000000005624bf in PortalRun (portal=0xa05030,
count=9223372036854775807, dest=0xab2dd0, altdest=0xab2dd0,
completionTag=0x7fffffffd5f0 "")
at pquery.c:646
#17 0x000000000055ea36 in exec_simple_query (
query_string=0x9f3250 "<snip>"...) at postgres.c:1002
#18 0x000000000056031e in PostgresMain (argc=4, argv=0x83b828,
username=0x83b740 "www") at postgres.c:3217
#19 0x000000000053a487 in ServerLoop () at postmaster.c:2851
#20 0x000000000053bab7 in PostmasterMain (argc=3,
argv=0x7fffffffebb0) at postmaster.c:941
#21 0x0000000000500436 in main (argc=3, argv=0x7fffffffebb0) at
main.c:265

#0 0x00000008013190ec in semop () from /lib/libc.so.6
#1 0x0000000000532da0 in PGSemaphoreUnlock (sema=0x87b837920) at
pg_sema.c:443
#2 0x000000000055949c in LWLockRelease (lockid=LockMgrLock) at
lwlock.c:554
#3 0x00000000005557b2 in LockAcquire (lockmethodid=1,
locktag=0x7fffffffd270, isTempObject=-120 '\210', lockmode=1,
sessionLock=-72 '?',
dontWait=0 '\0') at lock.c:771
#4 0x0000000000554544 in LockRelation (relation=0xa4d318,
lockmode=0) at lmgr.c:144
#5 0x000000000044fec8 in index_beginscan_internal
(indexRelation=0xa4d318, nkeys=1, key=0xaa5df8) at indexam.c:300
#6 0x000000000044ff7e in index_beginscan_multi
(indexRelation=0x20011, snapshot=0xaa3030, nkeys=-11904, key=0x0) at
indexam.c:267
#7 0x00000000004edb5a in ExecInitBitmapIndexScan (node=0xab5c20,
estate=0xaa4030) at nodeBitmapIndexscan.c:286
#8 0x00000000004e3685 in ExecInitNode (node=0xab5c20,
estate=0xaa4030) at execProcnode.c:167
#9 0x00000000004ecf6e in ExecInitBitmapOr (node=0xab5dd0,
estate=0xaa4030) at nodeBitmapOr.c:87
#10 0x00000000004e36b5 in ExecInitNode (node=0xab6ae0,
estate=0xaa4030) at execProcnode.c:152
#11 0x00000000004ed82a in ExecInitBitmapHeapScan (node=0xab7198,
estate=0xaa4030) at nodeBitmapHeapscan.c:501
#12 0x00000000004e3675 in ExecInitNode (node=0xab7198,
estate=0xaa4030) at execProcnode.c:171
#13 0x00000000004e1c42 in ExecutorStart (queryDesc=0xaa3440,
explainOnly=-128 '\200') at execMain.c:617
#14 0x0000000000561a0c in PortalStart (portal=0xa05030, params=0x0,
snapshot=0x0) at pquery.c:399
#15 0x000000000055e9d8 in exec_simple_query (
query_string=0x9f3030 "<snip>") at postgres.c:965
#16 0x000000000056031e in PostgresMain (argc=4, argv=0x83b828,
username=0x83b740 "www") at postgres.c:3217
#17 0x000000000053a487 in ServerLoop () at postmaster.c:2851
#18 0x000000000053bab7 in PostmasterMain (argc=3,
argv=0x7fffffffebb0) at postmaster.c:941
#19 0x0000000000500436 in main (argc=3, argv=0x7fffffffebb0) at
main.c:265
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim C. Nasby 2006-05-11 17:18:06 Re: [HACKERS] Big IN() clauses etc : feature proposal
Previous Message Martijn van Oosterhout 2006-05-11 16:41:51 Re: Bug in signal handler