emergency outage requiring database restart

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: emergency outage requiring database restart
Date: 2016-10-13 21:07:46
Message-ID: CAHyXU0xL7qFCkHt45mfP6VdwoN_j4x8DoX8-_xhSuJ8re7f96g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Today I had an emergency production outage on a server. This
particular server was running 9.5.2. The symptoms were interesting
so I thought I'd report. Here is what I saw:

*) User CPU was pegged 100%
*) Queries reading data would block and not respond to cancel or terminate
*) pg_stat_activity reported no waiting queries (but worked fine otherwise).

Adding all this up it smells like processes were getting stuck on a spinlock.

Connections quickly got eaten up and situation was desperately urgent
so I punted and did an immediate restart and things came back
normally. I had a console to the database and did manage to grab
contents of pg_stat_activity and noticed several trivial queries were
running normally (according to pg_stat_activity) but were otherwise
stuck. Attempting to run one of them myself, I noted query got stuck
and did not cancel. I was in a terrible rush but am casting around
for stuff to grab out in case that happens again -- 'perf top' would
be a natural choice I guess.

Three autovacuum processes were running. Obviously going to do bugfix
upgrade but was wondering if anybody has seen anything like this.
This particular server was upgraded to 9.5 somewhat recently but ran
on 9.2 for years with no issues.

merlin

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-10-13 21:13:30 Re: emergency outage requiring database restart
Previous Message Jim Nasby 2016-10-13 20:36:17 Re: How to inspect tuples during execution of a plan?