Re: [HACKERS] emergency outage requiring database restart

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Oskari Saarenmaa <os(at)ohmu(dot)fi>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] emergency outage requiring database restart
Date: 2020-02-07 16:44:01
Message-ID: CAHyXU0xY8EUDRnXmeZ9OXD5EpM+vXfAuvOwJTUHNHpA-AV=L_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 3, 2017 at 1:05 PM Peter Eisentraut
<peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
>
> On 11/7/16 5:31 PM, Merlin Moncure wrote:
> > Regardless, it seems like you might be on to something, and I'm
> > inclined to patch your change, test it, and roll it out to production.
> > If it helps or at least narrows the problem down, we ought to give it
> > consideration for inclusion (unless someone else can think of a good
> > reason not to do that, heh!).
>
> Any results yet?

Not yet. But I do have some interesting findings. At this point I
do not think the problem is within pl/sh itself, but that when a
process is invoked from pl/sh misbehaves that misbehavior can
penetrate into the database processes. I also believe that this
problem is fd related, so that the 'close on exec' might reasonably
fix it. All cases of database damage I have observed remain
completely mitigated by enabling database checksums.

Recently, a sqsh process kicked off via pl/sh crashed with signal 11
but the database process was otherwise intact and fine. This is
strong supporting evidence to my points above, I think. I've also
turned up a fairly reliable reproduction case from some unrelated
application changes. If I can demonstrate that close on exec flag
works and prevents these occurrences we can close the book on this.

merlin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2020-02-07 17:28:41 Re: [Proposal] Global temporary tables
Previous Message Dmitry Dolgov 2020-02-07 16:25:43 Re: Index Skip Scan