Re: BUG #5905: segfault on pg_terminate_backend

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Matt Harrington <matt(dot)harrington(at)rentrak(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5905: segfault on pg_terminate_backend
Date: 2011-03-01 19:26:48
Message-ID: AANLkTik_NFbQ5eXizWJrK_cRjBYtxUJKVSMhtz2GT3jT@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Mar 1, 2011 at 12:50 PM, Matt Harrington
<matt(dot)harrington(at)rentrak(dot)com> wrote:
>
> The following bug has been logged online:
>
> Bug reference:      5905
> Logged by:          Matt Harrington
> Email address:      matt(dot)harrington(at)rentrak(dot)com
> PostgreSQL version: 8.4.3
> Operating system:   CentOS 4.7
> Description:        segfault on pg_terminate_backend
> Details:
>
> I have occasionally triggered a segfault while executing
> pg_terminate_backend (or kill on child postgres processes.)  I still have
> the core file and associated logs.
>
> $ gdb -c core.5746 /usr/local/stow/postgresql-8.4.3/bin/postmaster
> GNU gdb Red Hat Linux (6.3.0.0-1.159.el4rh)
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you
> are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu"...Using host
> libthread_db library "/lib64/tls/libthread_db.so.1".
>
> Core was generated by `postgres: postgres att_dev [local] SELECT
>                        '.
> Program terminated with signal 11, Segmentation fault.
> Reading symbols from /lib64/libcrypt.so.1...done.
> Loaded symbols for /lib64/libcrypt.so.1
> Reading symbols from /lib64/libdl.so.2...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/tls/libm.so.6...done.
> Loaded symbols for /lib64/tls/libm.so.6
> Reading symbols from /lib64/tls/libc.so.6...done.
> Loaded symbols for /lib64/tls/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from
> /data_storage/local/stow/postgresql-8.4.3/lib/auto_explain.so...done.
> Loaded symbols for
> /data_storage/local/stow/postgresql-8.4.3/lib/auto_explain.so
> Reading symbols from
> /data_storage/local/stow/postgresql-8.4.3/lib/pg_stat_statements.so...done.
> Loaded symbols for
> /data_storage/local/stow/postgresql-8.4.3/lib/pg_stat_statements.so
> Reading symbols from
> /data_storage/local/stow/postgresql-8.4.3/lib/plpgsql.so...done.
> Loaded symbols for /data_storage/local/stow/postgresql-8.4.3/lib/plpgsql.so
> #0  ResourceOwnerEnlargeCatCacheRefs (owner=0x0) at resowner.c:601
> 601             if (owner->ncatrefs < owner->maxcatrefs)
> (gdb) bt
> #0  ResourceOwnerEnlargeCatCacheRefs (owner=0x0) at resowner.c:601
> #1  0x000000000066586e in SearchCatCache (cache=0xa26ff0, v1=Variable "v1"
> is not available.
> ) at catcache.c:1182
> #2  0x0000000000536095 in sql_exec_error_callback (arg=Variable "arg" is not
> available.
> ) at functions.c:883
> #3  0x000000000067355c in errfinish (dummy=Variable "dummy" is not
> available.
> ) at elog.c:386
> #4  0x00000000005cf3e5 in log_disconnections (code=Variable "code" is not
> available.
> ) at postgres.c:4019
> #5  0x00000000005bd020 in proc_exit_prepare (code=1) at ipc.c:182
> #6  0x00000000005bd048 in proc_exit (code=0) at ipc.c:95
> #7  0x00000000006736d6 in errfinish (dummy=Variable "dummy" is not
> available.
> ) at elog.c:488
> #8  0x00000000005cb4c1 in ProcessInterrupts () at postgres.c:2653
> #9  0x00000000005342fa in ExecScan (node=0xaccb60, accessMtd=0x540000
> <SeqNext>) at execScan.c:101
> #10 0x000000000052d86d in ExecProcNode (node=0xaccb60) at
> execProcnode.c:363
> #11 0x000000000053c83c in ExecHashJoin (node=0xb49920) at
> nodeHashjoin.c:131
> #12 0x000000000052d7bd in ExecProcNode (node=0xb49920) at
> execProcnode.c:412
> #13 0x0000000000540b1a in ExecSort (node=0xb49810) at nodeSort.c:102
> #14 0x000000000052d79a in ExecProcNode (node=0xb49810) at
> execProcnode.c:423
> #15 0x000000000052b5ca in standard_ExecutorRun (queryDesc=0xb153a0,
> direction=ForwardScanDirection, count=10) at execMain.c:1504
> #16 0x0000002a98417197 in explain_ExecutorRun (queryDesc=0xb153a0,
> direction=ForwardScanDirection, count=10) at auto_explain.c:168
> #17 0x0000002a9851a3e3 in pgss_ExecutorRun (queryDesc=0xb153a0,
> direction=ForwardScanDirection, count=10) at pg_stat_statements.c:514
> #18 0x00000000005cff49 in PortalRunSelect (portal=0xaa2db0, forward=Variable
> "forward" is not available.
> ) at pquery.c:953
> #19 0x00000000005d0e80 in PortalRunFetch (portal=0xaa2db0,
> fdirection=FETCH_FORWARD, count=10, dest=0x92db20) at pquery.c:1402
> #20 0x0000000000548a81 in _SPI_cursor_operation (portal=0xaa2db0,
> direction=FETCH_FORWARD, count=10, dest=0x92db20) at spi.c:2088
> #21 0x0000002ae7a195f4 in exec_for_query (estate=0x7fbfffcd50,
> stmt=0xaec558, portal=0xaa2db0, prefetch_ok=1 '\001') at pl_exec.c:4211
> #22 0x0000002ae7a197f6 in exec_stmt_fors (estate=0x7fbfffcd50,
> stmt=0xaec558) at pl_exec.c:1852
> #23 0x0000002ae7a16699 in exec_stmts (estate=0x7fbfffcd50, stmts=Variable
> "stmts" is not available.
> ) at pl_exec.c:1269
> #24 0x0000002ae7a17ff1 in exec_stmt_block (estate=0x0, block=0xaec650) at
> pl_exec.c:1137
> #25 0x0000002ae7a18b69 in plpgsql_exec_function (func=0xa88458,
> fcinfo=0x7fbfffd020) at pl_exec.c:315
> #26 0x0000002ae7a0f940 in plpgsql_call_handler (fcinfo=0x7fbfffd020) at
> pl_handler.c:95
> #27 0x000000000052fd64 in ExecMakeTableFunctionResult (funcexpr=0xabe000,
> econtext=0xabdd60, expectedDesc=0xabdec0, randomAccess=0 '\0')
>    at execQual.c:2016
> #28 0x000000000053f561 in FunctionNext (node=0xabdc50) at
> nodeFunctionscan.c:64
> #29 0x0000000000534331 in ExecScan (node=0x0, accessMtd=0x53f4f0
> <FunctionNext>) at execScan.c:68
> #30 0x000000000052d81d in ExecProcNode (node=0xabdc50) at
> execProcnode.c:385
> #31 0x0000000000537acc in ExecAppend (node=0xaa5dd0) at nodeAppend.c:269
> #32 0x000000000052d88d in ExecProcNode (node=0xaa5dd0) at
> execProcnode.c:348
> #33 0x000000000052b5ca in standard_ExecutorRun (queryDesc=0xb5dbc0,
> direction=ForwardScanDirection, count=0) at execMain.c:1504
> #34 0x0000002a98417197 in explain_ExecutorRun (queryDesc=0xb5dbc0,
> direction=ForwardScanDirection, count=0) at auto_explain.c:168
> #35 0x0000002a9851a3e3 in pgss_ExecutorRun (queryDesc=0xb5dbc0,
> direction=ForwardScanDirection, count=0) at pg_stat_statements.c:514
> #36 0x0000000000536da4 in fmgr_sql (fcinfo=0x7fbfffd9d0) at functions.c:472
> #37 0x000000000052fd64 in ExecMakeTableFunctionResult (funcexpr=0xaf14a0,
> econtext=0xaf11d0, expectedDesc=0xaf1360, randomAccess=0 '\0')
>    at execQual.c:2016
> #38 0x000000000053f561 in FunctionNext (node=0xaf10c0) at
> nodeFunctionscan.c:64
> #39 0x0000000000534331 in ExecScan (node=0x0, accessMtd=0x53f4f0
> <FunctionNext>) at execScan.c:68
> #40 0x000000000052d81d in ExecProcNode (node=0xaf10c0) at
> execProcnode.c:385
> #41 0x000000000052b5ca in standard_ExecutorRun (queryDesc=0xa98450,
> direction=ForwardScanDirection, count=0) at execMain.c:1504
> ---Type <return> to continue, or q <return> to quit---
> #42 0x0000002a98417197 in explain_ExecutorRun (queryDesc=0xa98450,
> direction=ForwardScanDirection, count=0) at auto_explain.c:168
> #43 0x0000002a9851a3e3 in pgss_ExecutorRun (queryDesc=0xa98450,
> direction=ForwardScanDirection, count=0) at pg_stat_statements.c:514
> #44 0x00000000005cff49 in PortalRunSelect (portal=0xaa2ca0, forward=Variable
> "forward" is not available.
> ) at pquery.c:953
> #45 0x00000000005d0760 in PortalRun (portal=0xaa2ca0,
> count=9223372036854775807, isTopLevel=1 '\001', dest=0xa55c98,
> altdest=0xa55c98,
>    completionTag=0x7fbfffe360 "") at pquery.c:779
> #46 0x00000000005cc2d6 in exec_simple_query (
>    query_string=0xa53da0 "select * from
> rtk_role_maint.sql_to_refresh_grants_for_dml       ('att', 'att_dml') ;") at
> postgres.c:991
> #47 0x00000000005cdcdf in PostgresMain (argc=4, argv=Variable "argv" is not
> available.
> ) at postgres.c:3614
> #48 0x00000000005a4ee0 in ServerLoop () at postmaster.c:3447
> #49 0x00000000005a686d in PostmasterMain (argc=3, argv=0x9b04e0) at
> postmaster.c:1040
> #50 0x00000000005530be in main (argc=3, argv=0x9b04e0) at main.c:188
>
>
> $ grep -A70 'Segmentation fault' postgresql-2011-03-01_000000.log
> <6719, 2011-03-01 06:00:15 PST> LOG:  server process (PID 5746) was
> terminated by signal 11: Segmentation fault
> <6719, 2011-03-01 06:00:15 PST> LOG:  terminating any other active server
> processes
> <5940, 2011-03-01 06:00:15 PST> WARNING:  terminating connection because of
> crash of another server process
> <5940, 2011-03-01 06:00:15 PST> DETAIL:  The postmaster has commanded this
> server process to roll back the current transaction and exit, because
> another server process exited abnormally and possibly corrupted shared
> memory.
> <5940, 2011-03-01 06:00:15 PST> HINT:  In a moment you should be able to
> reconnect to the database and repeat your command.
> ...
> <6040, 2011-03-01 06:00:17 PST> FATAL:  the database system is in recovery
> mode
> <6039, 2011-03-01 06:00:17 PST> LOG:  database system was interrupted; last
> known up at 2011-03-01 05:51:18 PST
> <6039, 2011-03-01 06:00:17 PST> LOG:  database system was not properly shut
> down; automatic recovery in progress
> <6039, 2011-03-01 06:00:17 PST> LOG:  redo starts at 7D0/6FF7E490
> <6039, 2011-03-01 06:00:17 PST> LOG:  unexpected pageaddr 7D0/20F9A000 in
> log file 2000, segment 111, offset 16359424
> <6039, 2011-03-01 06:00:17 PST> LOG:  redo done at 7D0/6FF97728
> <6039, 2011-03-01 06:00:17 PST> LOG:  last completed transaction was at log
> time 2011-03-01 06:00:04.083136-08
> <6039, 2011-03-01 06:00:17 PST> LOG:  checkpoint starting: end-of-recovery
> immediate
> <6039, 2011-03-01 06:00:17 PST> LOG:  checkpoint complete: wrote 24 buffers
> (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.009
> s, sync=0.002 s, total=0.011 s
> <6044, 2011-03-01 06:00:18 PST> LOG:  autovacuum launcher started
> <6719, 2011-03-01 06:00:18 PST> LOG:  database system is ready to accept
> connections
>
> Unfortunately, I have not been successful in generating a reproducible case;
> it may be a race condition edge case.

did you read: http://archives.postgresql.org/pgsql-bugs/2010-03/msg00170.php?

this may have been fixed -- check follow ups in pgsql-commits. and
upgrade to latest 8.4!!

merlin

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Matt Harrington 2011-03-01 19:30:07 Re: BUG #5905: segfault on pg_terminate_backend
Previous Message Heikki Linnakangas 2011-03-01 19:26:45 Re: BUG #5905: segfault on pg_terminate_backend