Re: emergency outage requiring database restart

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: emergency outage requiring database restart
Date: 2016-10-25 17:52:29
Message-ID: CAHyXU0yUTcvLs5Hk+_p2iFNz0EXf3otbLDUA56BnEj0tN7zztA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 24, 2016 at 9:18 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> Merlin Moncure wrote:
>> On Mon, Oct 24, 2016 at 6:01 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>
>> > Corruption struck again.
>> > This time got another case of view busted -- attempting to create
>> > gives missing 'type' error.
>>
>> Call it a hunch -- I think the problem is in pl/sh.
>
> I've heard that before.

well, yeah, previously I had an issue where the database crashed
during a heavy concurrent pl/sh based load. However the problems
went away when I refactored the code. Anyways, I looked at the code
and couldn't see anything obviously wrong so who knows? All I know is
my production database is exploding continuously and I'm looking for
answers. The only other extension in heavy use on this servers is
postgres_fdw.

The other database on the cluster is fine, which kind of suggests we
are not facing clog or WAL type problems.

After last night, I rebuilt the cluster, turning on checksums, turning
on synchronous commit (it was off) and added a standby replica. This
should help narrow the problem down should it re-occur; if storage is
bad (note, other database on same machine is doing 10x write activity
and is fine) or something is scribbling on shared memory (my guess
here) then checksums should be popped, right?

merlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2016-10-25 17:57:27 Re: emergency outage requiring database restart
Previous Message Bruce Momjian 2016-10-25 17:32:04 Re: Wraparound warning