Re: Proposal to add a QNX 6.5 port to PostgreSQL

From: Noah Misch <noah(at)leadboat(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "Baker, Keith [OCDUS Non-J&J]" <KBaker9(at)its(dot)jnj(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal to add a QNX 6.5 port to PostgreSQL
Date: 2014-08-10 22:36:18
Message-ID: 20140810223618.GA220435@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

[Due for a new subject line?]

On Sat, Aug 09, 2014 at 08:16:01PM +0200, Andres Freund wrote:
> On 2014-08-09 14:09:36 -0400, Tom Lane wrote:
> > Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > > On 2014-08-09 14:00:49 -0400, Tom Lane wrote:
> > >> I don't think it's anywhere near as black-and-white as you guys claim.
> > >> What it comes down to is whether allowing existing transactions/sessions
> > >> to finish is more important than allowing new sessions to start.
> > >> Depending on the application, either could be more important.
> >
> > > Nah. The current behaviour circumvents security measures we normally
> > > consider absolutely essential. If the postmaster died some bad shit went
> > > on. The likelihood of hitting corner case bugs where it's important that
> > > we react to a segfault/panic with a restart/crash replay is rather high.
> >
> > What's your point? Once a new postmaster starts, it *will* do a crash
> > restart, because certainly no shutdown checkpoint ever happened.
>
> That's not saying much. For one, there can be online checkpoints in that
> time. So it's certainly not guaranteed (or even all that likely) that
> all the WAL since the incident is replayed. For another, it can be
> *hours* before all the backends finish.
>
> IIRC we'll continue to happily write WAL and everything after postmaster
> (and possibly some backends, corrupting shmem) have crashed. The
> bgwriter, checkpointer, backends will continue to write dirty buffers to
> disk. We'll IIRC continue to write checkpoints. That's simply not
> things we should be doing after postmaster crashed if we can avoid at
> all.

The basic support processes, including the checkpointer, exit promptly upon
detecting a postmaster exit. Checkpoints cease. Your central point still
stands. WAL protects data integrity only to the extent that we stop writing
it after shared memory ceases to be trustworthy. Crash recovery of WAL
written based on corrupt buffers just reproduces the corruption.

> > The
> > only issue here is what grace period existing orphaned backends are given
> > to finish their work --- and it's not possible for the answer to that
> > to be "zero", so you don't get to assume that nothing happens in
> > backend-land after the instant of postmaster crash.

Our grace period for active backends after unclean exit of one of their peers
is low, milliseconds to seconds. Our grace period for active backends after
unclean exit of the postmaster is unconstrained. At least one of those
policies has to be wrong. Like Andres and Robert, I pick the second one.

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2014-08-10 23:11:43 Re: Proposal to add a QNX 6.5 port to PostgreSQL
Previous Message worthy7 2014-08-10 22:19:49 nulls in GIN index