From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
Cc: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: hung backends stuck in spinlock heavy endless loop |
Date: | 2015-01-15 19:15:42 |
Message-ID: | 20150115191542.GD14782@awork2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
> The plot thickens! I looped the test, still stock 9.4 as of this time
> and went to lunch. When I came back, the database was in recovery
> mode. Here is the rough sequence of events.
>
Whoa. That looks scary. Did you see (some of) those errors before? Most
of them should have been emitted independently of being built with
assertions.
> 1) REINDEXED pg_class (wanted clean slate for full reproduction)
> 2) before too long (unfortunately did not configure timestamps in the
> log) starting seeing:
>
> ERROR: root page 3 of index "pg_class_oid_index" has level 0,
> expected 1 at character 8
> QUERY: UPDATE CDSRunTable SET
> Finished = clock_timestamp(),
> DidSucceed = _DidSucceed,
> ErrorMessage = _ErrorMessage
> WHERE CDSRunTableId = _CdsRunTableId
> CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 82 at
> SQL statement
> STATEMENT: SELECT CDSReconcileRunTable(2020)
>
> ..and again with a FATAL
> FATAL: root page 3 of index "pg_class_oid_index" has level 0, expected 1
> ERROR: root page 3 of index "pg_class_oid_index" has level 0, expected 1
> CONTEXT: SQL statement "UPDATE CDSRunTable SET
> Finished = clock_timestamp(),
> DidSucceed = _DidSucceed,
> ErrorMessage = _ErrorMessage
> WHERE CDSRunTableId = _CdsRunTableId"
> PL/pgSQL function cdsreconcileruntable(bigint) line 82 at SQL statement
Just to be absolutely clear, those happened *before* the following
errors? And there were no 'during exception cleanup' like errors before?
> 3) shortly (?) after that, I got:
>
> WARNING: did not find subXID 14955 in MyProc
> CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35
> during exception cleanup
> WARNING: you don't own a lock of type RowExclusiveLock
> CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35
> during exception cleanup
> LOG: could not send data to client: Broken pipe
> CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35
> during exception cleanup
> STATEMENT: SELECT CDSReconcileRunTable(2151)
> WARNING: ReleaseLockIfHeld: failed??
> CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35
> during exception cleanup
> ERROR: failed to re-find shared proclock object
> CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35
> during exception cleanup
> STATEMENT: SELECT CDSReconcileRunTable(2151)
> WARNING: AbortSubTransaction while in ABORT state
> WARNING: did not find subXID 14955 in MyProc
> WARNING: you don't own a lock of type AccessShareLock
> WARNING: ReleaseLockIfHeld: failed??
> ERROR: failed to re-find shared proclock object
> WARNING: AbortSubTransaction while in ABORT state
> WARNING: did not find subXID 14955 in MyProc
> WARNING: you don't own a lock of type AccessShareLock
> WARNING: ReleaseLockIfHeld: failed??
> WARNING: you don't own a lock of type ShareLock
> TRAP: FailedAssertion("!(FastPathStrongRelationLocks->count[fasthashcode]
> > 0)", File: "lock.c", Line: 1240)
> LOG: server process (PID 10117) was terminated by signal 6: Aborted
> LOG: terminating any other active server processes
Ick.
Were there any 'LOG: Handling deadlock detected on CdsRunTableId' log
entries before? It's hard to know from here, but the 'during exception
cleanup' indicates a problem in abort handling. Were there any deadlock
detected errors closeby?
You're catching deadlock errors in a subtransaction. Hm.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2015-01-15 19:27:34 | Re: hung backends stuck in spinlock heavy endless loop |
Previous Message | Heikki Linnakangas | 2015-01-15 18:57:36 | Re: XLOG_PARAMETER_CHANGE handling of wal_log_hints |