Tips for investigating DBMS failure

From: Karl DeBisschop <kdebisschop(at)range(dot)infoplease(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Tips for investigating DBMS failure
Date: 2000-10-16 16:50:03
Message-ID: 39EB31BB.A18C87C7@alert.infoplease.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

We had a production DBMS die in a failrly dramitica way this morning. I
suspect the cause may have been interactions between lock and pg_dump,
but I'm a long way off from being able to say that with any certainty
at all. In the end, I moved the data directroy off to the side and
restored all of if the data into a newly initiallized data directory.

The databases are backed up fairly frequently, but there is likely to
be some data since the previous backup that I'd like to recover. I
have tried starting a postmaster on a different port to get the data,
but I just can't seem to make any headway. Postmater runs, and I can
get copious debugging text, but psql always complains that the backend
failed unexpectedly whenever I try to make any sort of connection.

Can anyone give me general tips on how I might recover this data?
(PostgreSQL 7.0-2, redhat 6.2smp on a dual processor netfinity).

If it's of any use, I have attached the debug output - I don't see
where the problem is occurring, but I don't pretend to fully understand
the output, other than the exit is status 25. Someone with just a little
bit of insight would probably do alot better with this than I.

>>>>> debug
postmaster: ServerLoop: handling reading 4
postmaster: ServerLoop: handling reading 4
postmaster: ServerLoop: handling writing 4
postmaster: BackendStartup: environ dump:
-----------------------------------------
USERNAME=
HISTSIZE=1000
HISTFILESIZE=1000
SSH_TTY=/dev/pts/2
MAIL=/var/spool/mail/kdebisschop
TERM=xterm
HOSTTYPE=i386
PATH=/usr/bin:/bin:/usr/sbin:/sbin:/usr/X11R6/bin:
INPUTRC=/etc/inputrc
SHELL=/bin/bash
PS1=[\u(at)\h \W]\$
USER=postgres
OSTYPE=Linux
SHLVL=1
_=/usr/bin/sudo
SUDO_COMMAND=/usr/bin/postmaster -d 99 -D /disk/1/pgsql/data1 -p 5433
POSTPORT=5433
POSTID=2147483644
PGDATA=/disk/1/pgsql/data1
IPC_KEY=5433100
PG_USER=postgres
-----------------------------------------

postmaster: BackendStartup: pid 4532 user postgres db template1 socket 4
postmaster child[4532]: starting with (/usr/bin/postgres -d99 -v131072 -p template1 )
001016.12:36:23.302 [4532] read_pg_options: verbose=2,query=2,syslog=0
FindExec: found "/usr/bin/postgres" using argv[0]
001016.12:36:23.303 [4532] started: host=localhost user=postgres database=template1
001016.12:36:23.303 [4532] InitPostgres
001016.12:36:23.333 [4532] reset_client_encoding()..
001016.12:36:23.333 [4532] reset_client_encoding() done.
001016.12:36:23.333 [4532] StartTransactionCommand
001016.12:36:23.333 [4532] query: select getdatabaseencoding()
001016.12:36:23.337 [4532] parser outputs:

{ QUERY :command 1 :utility <> :resultRelation 0 :into <> :isPortal false :isBinary false :isTemp false :unionall false :distinctClause <> :sortClause <> :rtable <> :targetlist ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 19 :restypmod -1 :resname getdatabaseencoding :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { EXPR :typeOid 19 :opType func :oper { FUNC :funcid 1039 :functype 19 :funcisindex false :funcsize 0 :func_fcache @ 0x0 :func_tlist ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 19 :restypmod -1 :resname \<noname> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno -1 :varattno 1 :vartype 19 :vartypmod -1 :varlevelsup 0 :varnoold -1 :varoattno 1}}) :func_planlist <>} :args <>}}) :qual <> :groupClause <> :havingQual <> :hasAggs false :hasSubLinks false :unionClause <> :intersectClause <> :limitOffset <> :limitCount <> :rowMark <>}

001016.12:36:23.337 [4532] after rewriting:
{ QUERY
:command 1
:utility <>
:resultRelation 0
:into <>
:isPortal false
:isBinary false
:isTemp false
:unionall false
:distinctClause <>
:sortClause <>
:rtable <>
:targetlist (

{ TARGETENTRY
:resdom
{ RESDOM
:resno 1
:restype 19
:restypmod -1
:resname getdatabaseencoding
:reskey 0
:reskeyop 0
:ressortgroupref 0
:resjunk false
}

:expr
{ EXPR
:typeOid 19
:opType func
:oper
{ FUNC
:funcid 1039
:functype 19
:funcisindex false
:funcsize 0
:func_fcache @ 0x0
:func_tlist (
{ TARGETENTRY
:resdom
{ RESDOM
:resno 1
:restype 19
:restypmod -1
:resname \<noname>
:reskey 0
:reskeyop 0
:ressortgroupref 0
:resjunk false
}

:expr
{ VAR
:varno -1
:varattno 1
:vartype 19
:vartypmod -1
:varlevelsup 0
:varnoold -1
:varoattno 1
}
}
)

:func_planlist <>
}

:args <>
}
}
)

:qual <>
:groupClause <>
:havingQual <>
:hasAggs false
:hasSubLinks false
:unionClause <>
:intersectClause <>
:limitOffset <>
:limitCount <>
:rowMark <>
}

001016.12:36:23.339 [4532] plan:

{ RESULT :startup_cost 0.00 :total_cost 0.00 :rows 0 :width 0 :state <> :qptargetlist ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 19 :restypmod -1 :resname getdatabaseencoding :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { EXPR :typeOid 19 :opType func :oper { FUNC :funcid 1039 :functype 19 :funcisindex false :funcsize 0 :func_fcache @ 0x0 :func_tlist ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 19 :restypmod -1 :resname \<noname> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno -1 :varattno 1 :vartype 19 :vartypmod -1 :varlevelsup 0 :varnoold -1 :varoattno 1}}) :func_planlist <>} :args <>}}) :qpqual <> :lefttree <> :righttree <> :extprm () :locprm () :initplan <> :nprm 0 :resconstantqual <>}

001016.12:36:23.339 [4532] ProcessQuery
001016.12:36:23.339 [4532] CommitTransactionCommand
postmaster: reaping dead processes...
postmaster: CleanupProc: pid 4532 exited with status 25
Server process (pid 4532) exited with status 25 at Mon Oct 16 12:36:23 2000
Terminating any active server processes...
Server processes were terminated at Mon Oct 16 12:36:23 2000
Reinitializing shared memory and semaphores
001016.12:36:23.340 [4508] shmem_exit(0)
binding ShmemCreate(key=52e771, size=1104896)
001016.12:36:23.343 [4533] DEBUG: Data Base System is starting up at Mon Oct 16 12:36:23 2000
001016.12:36:23.343 [4533] DEBUG: Data Base System was interrupted being in production at Mon Oct 16 12:35:44 2000
001016.12:36:23.365 [4533] DEBUG: Data Base System is in production state at Mon Oct 16 12:36:23 2000
001016.12:36:23.365 [4533] proc_exit(0)
001016.12:36:23.365 [4533] shmem_exit(0)
001016.12:36:23.365 [4533] exit(0)
postmaster: reaping dead processes...
<<<<<<<< debug

Many thanks for any insight.

--
Karl DeBisschop kdebisschop(at)alert(dot)infoplease(dot)com
Learning Network Reference http://www.infoplease.com
Netsaint Plugin Developer kdebisschop(at)users(dot)sourceforge(dot)net

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2000-10-16 16:50:47 Re: PL/Perl compilation error
Previous Message Jim Mercer 2000-10-16 16:33:39 Re: 7.1 Release Date