Re: hung backends stuck in spinlock heavy endless loop

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hung backends stuck in spinlock heavy endless loop
Date: 2015-01-16 14:05:07
Message-ID: CAHyXU0yHgGHcwkS1HoHUQtgVsMvAM2pB_3qwc_OUNCm5efY3Lw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> Running this test on another set of hardware to verify -- if this
>> turns out to be a false alarm which it may very well be, I can only
>> offer my apologies! I've never had a new drive fail like that, in
>> that manner. I'll burn the other hardware in overnight and report
>> back.

huh -- well possibly. not. This is on a virtual machine attached to a
SAN. It ran clean for several (this is 9.4 vanilla, asserts off,
checksums on) hours then the starting having issues:

[cds2 21952 2015-01-15 22:54:51.833 CST 5502]WARNING: page
verification failed, calculated checksum 59143 but expected 59137 at
character 20
[cds2 21952 2015-01-15 22:54:51.852 CST 5502]QUERY:
DELETE FROM "onesitepmc"."propertyguestcard" t
WHERE EXISTS
(
SELECT 1 FROM "propertyguestcard_insert" d
WHERE (t."prptyid", t."gcardid") = (d."prptyid", d."gcardid")
)

[cds2 21952 2015-01-15 22:54:51.852 CST 5502]CONTEXT: PL/pgSQL
function cdsreconciletable(text,text,text,text,boolean) line 197 at
EXECUTE statement
SQL statement "SELECT * FROM CDSReconcileTable(
t.CDSServer,
t.CDSDatabase,
t.SchemaName,
t.TableName)"
PL/pgSQL function cdsreconcileruntable(bigint) line 35 at SQL statement

After that, several hours of clean running, followed by:

[cds2 32353 2015-01-16 04:40:57.814 CST 7549]WARNING: did not find
subXID 7553 in MyProc
[cds2 32353 2015-01-16 04:40:57.814 CST 7549]CONTEXT: PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.018 CST 7549]WARNING: you don't own a
lock of type AccessShareLock
[cds2 32353 2015-01-16 04:40:58.018 CST 7549]CONTEXT: PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]LOG: could not send data
to client: Broken pipe
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT: PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]STATEMENT: SELECT
CDSReconcileRunTable(1160)
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT: PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING: you don't own a
lock of type AccessShareLock
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT: PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT: PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING: you don't own a
lock of type AccessShareLock
[cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT: PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: you don't own a
lock of type AccessShareLock
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: you don't own a
lock of type ShareLock
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]ERROR: failed to re-find
shared lock object
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL
function cdsreconcileruntable(bigint) line 35 during exception cleanup
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]STATEMENT: SELECT
CDSReconcileRunTable(1160)
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
AbortSubTransaction while in ABORT state
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: did not find
subXID 7553 in MyProc
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: you don't own a
lock of type RowExclusiveLock
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: you don't own a
lock of type AccessExclusiveLock
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]ERROR: failed to re-find
shared lock object
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
AbortSubTransaction while in ABORT state
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: did not find
subXID 7553 in MyProc
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]ERROR: failed to re-find
shared lock object
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
AbortSubTransaction while in ABORT state
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: did not find
subXID 7553 in MyProc
[cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: you don't own a
lock of type RowExclusiveLock
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: you don't own a
lock of type RowShareLock
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: you don't own a
lock of type AccessShareLock
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: you don't own a
lock of type ExclusiveLock
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]ERROR: failed to re-find
shared lock object
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:
AbortSubTransaction while in ABORT state
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: did not find
subXID 7553 in MyProc
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: you don't own a
lock of type RowExclusiveLock
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: you don't own a
lock of type ShareLock
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: you don't own a
lock of type AccessExclusiveLock
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: you don't own a
lock of type ShareLock
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: you don't own a
lock of type AccessShareLock
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING: you don't own a
lock of type RowShareLock
[cds2 32353 2015-01-16 04:40:58.028 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.029 CST 7549]WARNING: you don't own a
lock of type AccessExclusiveLock
[cds2 32353 2015-01-16 04:40:58.029 CST 7549]WARNING:
ReleaseLockIfHeld: failed??
[cds2 32353 2015-01-16 04:40:58.029 CST 7549]ERROR: failed to re-find
shared lock object
[cds2 32353 2015-01-16 04:40:58.029 CST 7549]PANIC:
ERRORDATA_STACK_SIZE exceeded
[ 3093 2015-01-16 04:41:00.299 CST 0]LOG: server process (PID 32353)
was terminated by signal 6: Aborted
[ 3093 2015-01-16 04:41:00.300 CST 0]LOG: terminating any other
active server processes

After that, server resumed processing without further incident.

merlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-01-16 14:06:12 Re: Safe memory allocation functions
Previous Message Andres Freund 2015-01-16 14:01:06 Re: PATCH: Reducing lock strength of trigger and foreign key DDL