Re: emergency outage requiring database restart

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>
Cc: Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: emergency outage requiring database restart
Date: 2016-10-28 20:16:05
Message-ID: 6d8cc07c-8504-0d2a-d088-fe47a407228f@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/28/16 8:23 AM, Merlin Moncure wrote:
> On Thu, Oct 27, 2016 at 6:39 PM, Greg Stark <stark(at)mit(dot)edu> wrote:
>> On Thu, Oct 27, 2016 at 9:53 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>>> I think we can rule out faulty storage
>>
>> Nobody ever expects the faulty storage

LOL

> Believe me, I know. But the evidence points elsewhere in this case;
> this is clearly application driven.

FWIW, just because it's triggered by specific application behavior
doesn't mean there isn't a storage bug. That's what makes data
corruption bugs such a joy to figure out.

BTW, if you haven't already, I would reset all your storage related
options and GUCs to safe defaults... plain old FSYNC, no cute journal /
FS / mount options, etc. Maybe this is related to the app, but the most
helpful thing right now is to find some kind of safe config so you can
start bisecting.

I would also consider alternatives to plsh, just to rule it out if
nothing else. I'd certainly look at some way to get sqsh out of the loop
(again, just to get something that doesn't crash). First idea that comes
to mind is a stand-alone shell script that watches a named pipe for a
filename; when it gets that file it runs it with sqsh and does something
to signal completion.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-10-28 20:34:57 Re: Speed of user-defined aggregates using array_append as transfn
Previous Message Andrew Dunstan 2016-10-28 19:33:41 Re: Speed of user-defined aggregates using array_append as transfn