segfault in pg 8.4, CurrentResourceOwner == NULL while processing SIGTERM

From: Dennis Koegel <dk(at)openit(dot)de>
To: pgsql-bugs(at)postgresql(dot)org
Subject: segfault in pg 8.4, CurrentResourceOwner == NULL while processing SIGTERM
Date: 2010-03-19 12:34:45
Message-ID: 20100319123445.GA20048@openit.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

a week ago, we have migrated our production setup -- three amd64
machines on RHEL4 running only postgres and slony -- from pg 8.1 to
pg 8.4.

Since 8.4, we encounter rare crashes (between two and ten times
per day for the whole setup) of single backend server processes,
loosely related to very high load situations. This occures on all
three machines. Of course, the system immediately aborts everything
and throws the database into recovery mode.

The crash always is triggered by dereferencing CurrentResourceOwner,
which is NULL:

#0 ResourceOwnerEnlargeCatCacheRefs (owner=0x0) at resowner.c:601
601 if (owner->ncatrefs < owner->maxcatrefs)

We automatically SIGTERM long running queries (bad situation, of
course, but necessary under certain load conditions here). This has
been the case for 8.1 as well and worked fine ever since. Even with
8.4 it works fine most of the time.

It appears though, that since 8.4 we rarely hit a condition where
CurrentResourceOwner is NULL while the process is dying.

At frame #8 (ProcessInterrupts), postgres starts processing such
a SIGTERM. Here's the full trace:

#0 ResourceOwnerEnlargeCatCacheRefs (owner=0x0) at resowner.c:601
#1 0x000000000066501e in SearchCatCache (cache=0xa33d40, v1=Variable "v1" is not available.) at catcache.c:1182
#2 0x0000000000535cb5 in sql_exec_error_callback (arg=Variable "arg" is not available.) at functions.c:883
#3 0x0000000000672f8c in errfinish (dummy=Variable "dummy" is not available.) at elog.c:386
#4 0x00000000005ced05 in log_disconnections (code=Variable "code" is not available.) at postgres.c:4019
#5 0x00000000005bc950 in proc_exit_prepare (code=1) at ipc.c:182
#6 0x00000000005bc978 in proc_exit (code=0) at ipc.c:95
#7 0x0000000000673106 in errfinish (dummy=Variable "dummy" is not available.) at elog.c:488
#8 0x00000000005cade1 in ProcessInterrupts () at postgres.c:2653
#9 0x000000000052d4b5 in ExecProcNode (node=0xc87ff0) at execProcnode.c:330
#10 0x000000000052b37a in standard_ExecutorRun (queryDesc=0xc82180, direction=ForwardScanDirection, count=1) at execMain.c:1504
#11 0x00000000005367e4 in fmgr_sql (fcinfo=0x7fbfffbb70) at functions.c:472
#12 0x000000000052f595 in ExecMakeFunctionResult (fcache=Variable "fcache" is not available.) at execQual.c:1685
#13 0x00000000005327da in ExecEvalExprSwitchContext (expression=0x0, econtext=0x401d, isNull=0x401d <Address 0x401d out of bounds>, isDone=0xa39b08) at execQual.c:4095
#14 0x0000000000591d74 in evaluate_expr (expr=Variable "expr" is not available.) at clauses.c:3842
#15 0x00000000005923a8 in simplify_function (funcid=16413, result_type=25, result_typmod=-1, args=0x7fbfffc3f8, allow_inline=1 '\001', context=0x7fbfffc680) at clauses.c:3447
#16 0x0000000000592af9 in eval_const_expressions_mutator (node=0x2b9f654778, context=0x7fbfffc680) at clauses.c:2134
#17 0x0000000000554887 in expression_tree_mutator (node=0x2b9f654728, mutator=0x592740 <eval_const_expressions_mutator>, context=0x7fbfffc680) at nodeFuncs.c:1944
#18 0x0000000000592b5a in eval_const_expressions_mutator (node=0x2b9f6546d8, context=0x7fbfffc680) at clauses.c:2167
#19 0x0000000000554887 in expression_tree_mutator (node=0x2b9f654578, mutator=0x592740 <eval_const_expressions_mutator>, context=0x7fbfffc680) at nodeFuncs.c:1944
#20 0x0000000000592b5a in eval_const_expressions_mutator (node=0x2b9f654528, context=0x7fbfffc680) at clauses.c:2167
#21 0x0000000000592dcb in eval_const_expressions_mutator (node=Variable "node" is not available.) at clauses.c:3118
#22 0x00000000005938af in eval_const_expressions (root=Variable "root" is not available.) at clauses.c:2027
#23 0x0000000000584fa6 in preprocess_expression (root=0x2b9f64e5a0, expr=0xa39b08, kind=0) at planner.c:540
#24 0x0000000000585119 in preprocess_qual_conditions (root=0x2b9f64e5a0, jtnode=0x2b9f6540a0) at planner.c:602
#25 0x0000000000587863 in subquery_planner (glob=0xcb78c0, parse=0x2b9f6523f0, parent_root=0xcd2648, hasRecursion=Variable "hasRecursion" is not available.) at planner.c:383
#26 0x0000000000572444 in set_rel_pathlist (root=0xcd2648, rel=0x2b9f64c6d0, rti=2, rte=0xcc1340) at allpaths.c:630
#27 0x00000000005729b8 in make_one_rel (root=0xcd2648, joinlist=0x2b9f64c6a0) at allpaths.c:157
#28 0x0000000000584c5c in query_planner (root=0xcd2648, tlist=Variable "tlist" is not available.) at planmain.c:252
#29 0x0000000000585a30 in grouping_planner (root=0xcd2648, tuple_fraction=0) at planner.c:1006
#30 0x0000000000587b0e in subquery_planner (glob=0xcb78c0, parse=0xcb7188, parent_root=0x2b9f639d00, hasRecursion=Variable "hasRecursion" is not available.) at planner.c:481
#31 0x0000000000587d92 in standard_planner (parse=0xcb7188, cursorOptions=0, boundParams=0x0) at planner.c:190
#32 0x0000000000587f45 in planner (parse=0x0, cursorOptions=16413, boundParams=0x401d) at planner.c:122
#33 0x00000000005cb82b in pg_plan_query (querytree=0xcb7188, cursorOptions=0, boundParams=0x0) at postgres.c:697
#34 0x00000000005cb8de in pg_plan_queries (querytrees=Variable "querytrees" is not available.) at postgres.c:756
#35 0x0000000000547b40 in _SPI_prepare_plan ( src=0xc37b88 "select null::int, null::text, null::int, null::text, null::int, null::text,\n", ' ' <repeats 14 times>, "count(*),\n", ' ' <repeats 14 times>, "count( distinct company_id ) as company_count\n from ( select c.*, publication_"..., plan=0x7fbfffcf80, boundParams=0x0) at spi.c:1641
#36 0x0000000000548c78 in SPI_cursor_open_with_args (name=0x0, src=0xc37b88 "select null::int, null::text, null::int, null::text, null::int, null::text,\n", ' ' <repeats 14 times>, "count(*),\n", ' ' <repeats 14 times>, "count( distinct company_id ) as company_count\n from ( select c.*, publication_"..., nargs=Variable "nargs" is not available.) at spi.c:1000
#37 0x0000002b9f7263a3 in exec_dynquery_with_params (estate=0x7fbfffd380, dynquery=Variable "dynquery" is not available.) at pl_exec.c:5454
#38 0x0000002b9f726e6e in exec_stmts (estate=0x7fbfffd380, stmts=Variable "stmts" is not available.) at pl_exec.c:2305
#39 0x0000002b9f726f4a in exec_stmts (estate=0x7fbfffd380, stmts=Variable "stmts" is not available.) at pl_exec.c:1439
#40 0x0000002b9f728061 in exec_stmt_block (estate=0x0, block=0xb89038) at pl_exec.c:1137
#41 0x0000002b9f728bb9 in plpgsql_exec_function (func=0xc86db0, fcinfo=0x7fbfffd7c0) at pl_exec.c:315
#42 0x0000002b9f71f940 in plpgsql_call_handler (fcinfo=0x7fbfffd7c0) at pl_handler.c:95
#43 0x0000000000678a7c in fmgr_security_definer (fcinfo=0x0) at fmgr.c:962
#44 0x000000000052f954 in ExecMakeTableFunctionResult (funcexpr=0xbc3880, econtext=0xbc05c0, expectedDesc=0xbc0eb0, randomAccess=0 '\0') at execQual.c:1912
#45 0x000000000053ef71 in FunctionNext (node=0xbc04b0) at nodeFunctionscan.c:64
#46 0x0000000000533f51 in ExecScan (node=0x0, accessMtd=0x53ef00 <FunctionNext>) at execScan.c:68
#47 0x000000000052d5cd in ExecProcNode (node=0xbc04b0) at execProcnode.c:385
#48 0x000000000053817e in ExecAgg (node=0xbbfd30) at nodeAgg.c:1054
#49 0x000000000052d52a in ExecProcNode (node=0xbbfd30) at execProcnode.c:431
#50 0x000000000052b37a in standard_ExecutorRun (queryDesc=0xa94670, direction=ForwardScanDirection, count=0) at execMain.c:1504
#51 0x00000000005cf869 in PortalRunSelect (portal=0xae3850, forward=Variable "forward" is not available.) at pquery.c:953
#52 0x00000000005d0080 in PortalRun (portal=0xae3850, count=9223372036854775807, isTopLevel=1 '\001', dest=0xa62f10, altdest=0xa62f10, completionTag=0x7fbfffe4f0 "") at pquery.c:779
#53 0x00000000005cd865 in PostgresMain (argc=Variable "argc" is not available.) at postgres.c:1928
#54 0x00000000005a4810 in ServerLoop () at postmaster.c:3447
#55 0x00000000005a619d in PostmasterMain (argc=3, argv=0x9b0540) at postmaster.c:1040
#56 0x0000000000552a5e in main (argc=3, argv=0x9b0540) at main.c:188

We are running 8.4.2 on this setup; we tried running the setup with
8.4.3 temporarily, but it resulted in the exact same crashes, so we
reverted to the 8.4.2 installation.

If you need further information, we're happy to assist.

Thank you!

Best regards,
--
Dennis Kögel, Systemadministration
Telefon +49-211-239577-0, Telefax -10, http://www.openit.de/

OpenIT GmbH, In der Steele 33a - 41, 40599 Düsseldorf, Germany
HRB 38815 Amtsgericht Düsseldorf, USt-Id DE 812951861
Geschäftsführer: Oliver Haakert, Maurice Kemmann

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Craig Ringer 2010-03-19 13:59:35 Re: segfault in pg 8.4, CurrentResourceOwner == NULL while processing SIGTERM
Previous Message Krishnamoorthy 2010-03-19 05:30:30 BUG #5380: error in xlog