Re: hung backends stuck in spinlock heavy endless loop

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: hung backends stuck in spinlock heavy endless loop
Date: 2015-01-28 18:26:01
Message-ID: CAHyXU0zRvKFAYozRsa_tm19zA2hAKiEUi0GU4xbWCvuyUaGFjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 28, 2015 at 8:05 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On Thu, Jan 22, 2015 at 3:50 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> I still haven't categorically ruled out pl/sh yet; that's something to
>> keep in mind.
>
> Well, after bisection proved not to be fruitful, I replaced the pl/sh
> calls with dummy calls that approximated the same behavior and the
> problem went away. So again, it looks like this might be a lot of
> false alarm. A pl/sh driven failure might still be interesting if
> it's coming from the internal calls it's making, so I'm still chasing
> it down.

...hm, I spoke to soon. So I deleted everything, and booted up a new
instance 9.4 vanilla with asserts on and took no other action.
Applying the script with no data activity fails an assertion every
single time:

mmoncure(at)mernix2 12:25 PM (REL9_4_STABLE) ~/src/p94$ cat
/mnt/ssd/data/pg_log/postgresql-28.log
[ 12287 2015-01-28 12:24:24.080 CST 0]LOG: received smart shutdown request
[ 13516 2015-01-28 12:24:24.080 CST 0]LOG: autovacuum launcher shutting down
[ 13513 2015-01-28 12:24:24.081 CST 0]LOG: shutting down
[ 13513 2015-01-28 12:24:24.083 CST 0]LOG: database system is shut down
[ 14481 2015-01-28 12:24:25.127 CST 0]LOG: database system was shut
down at 2015-01-28 12:24:24 CST
[ 14457 2015-01-28 12:24:25.129 CST 0]LOG: database system is ready
to accept connections
[ 14485 2015-01-28 12:24:25.129 CST 0]LOG: autovacuum launcher started
TRAP: FailedAssertion("!(flags & 0x0010)", File: "dynahash.c", Line: 330)
[ 14457 2015-01-28 12:24:47.983 CST 0]LOG: server process (PID 14545)
was terminated by signal 6: Aborted
[ 14457 2015-01-28 12:24:47.983 CST 0]DETAIL: Failed process was
running: SELECT CDSStartRun()
[ 14457 2015-01-28 12:24:47.983 CST 0]LOG: terminating any other
active server processes
[cds2 14546 2015-01-28 12:24:47.983 CST 0]WARNING: terminating
connection because of crash of another server process
[cds2 14546 2015-01-28 12:24:47.983 CST 0]DETAIL: The postmaster has
commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
[cds2 14546 2015-01-28 12:24:47.983 CST 0]HINT: In a moment you
should be able to reconnect to the database and repeat your command.
[ 14485 2015-01-28 12:24:47.983 CST 0]WARNING: terminating connection
because of crash of another server process
[ 14485 2015-01-28 12:24:47.983 CST 0]DETAIL: The postmaster has
commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
[ 14485 2015-01-28 12:24:47.983 CST 0]HINT: In a moment you should be
able to reconnect to the database and repeat your command.
[ 14457 2015-01-28 12:24:47.984 CST 0]LOG: all server processes
terminated; reinitializing
[ 14554 2015-01-28 12:24:47.995 CST 0]LOG: database system was
interrupted; last known up at 2015-01-28 12:24:25 CST
[ 14554 2015-01-28 12:24:47.995 CST 0]LOG: database system was not
properly shut down; automatic recovery in progress
[ 14554 2015-01-28 12:24:47.997 CST 0]LOG: invalid magic number 0000
in log segment 000000010000000000000001, offset 13000704
[ 14554 2015-01-28 12:24:47.997 CST 0]LOG: redo is not required
[ 14457 2015-01-28 12:24:48.000 CST 0]LOG: database system is ready
to accept connections
[ 14558 2015-01-28 12:24:48.000 CST 0]LOG: autovacuum launcher started

merlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-01-28 18:30:16 Re: jsonb, unicode escapes and escaped backslashes
Previous Message Andres Freund 2015-01-28 18:17:05 Re: Misaligned BufferDescriptors causing major performance problems on AMD