Re: Changeset Extraction v7.0 (was logical changeset generation)

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Changeset Extraction v7.0 (was logical changeset generation)
Date: 2014-01-23 12:05:03
Message-ID: 20140123120503.GB7182@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2014-01-22 13:00:44 -0500, Robert Haas wrote:
> Well, apparently, one is going to PANIC and reinitialize the system.
> I presume that upon reinitialization we'll decide that the slot is
> gone, and thus won't recreate it in shared memory.

Yea, and if it's half-gone we'll continue deletion. And since yesterday
evening we'll even fsync things during startup to handle scenarios
similar to 20140122162115(dot)GL21170(at)alap3(dot)anarazel(dot)de .

> Of course, if the entire system suffers a hard power failure after that and before the
> directory is succesfully fsync'd, then the slot could reappear on the
> next startup. Which is also exactly what would happen if we removed
> the slot from shared memory after doing the unlink, and then the
> system suffered a hard power failure before the directory contents
> made it to disk. Except that we also panicked.

Yes, but that could only happen as long as no relevant data has been
lost since we hold relevant locks during this.

> In the case of shared buffers, the way we handle fsync failures is by
> not allowing the system to checkpoint until all of the fsyncs succeed.

I don't think shared buffers fsyncs are the apt comparison. It's more
something like UpdateControlFile(). Which PANICs.

I really don't get why you fight PANICs in general that much. There are
some nasty PANICs in postgres which can happen in legitimate situations,
which should be made to fail more gracefully, but this surely isn't one
of them. We're doing rename(), unlink() and rmdir(). That's it.
We should concentrate on the ones that legitimately can happen, not the
ones created by an admin running a chmod -R 000 . ; rm -rf $PGDATA or
mount -o remount,ro /. We don't increase reliability by a bit adding
codepaths that will never get tested.

> If there's an OS-level reset before that happens, WAL replay will
> perform the same buffer modifications over again and the next
> checkpoint will again try to flush them to disk and will not complete
> unless it does. That forms a closed system where we never advance the
> redo pointer over the covering WAL record until the changes it covers
> are on the disk. But I don't think this code has any similar
> interlock; if it does, I missed it.

No, it doesn't (until the first rename() at least), but the number of
failure scenarios is far smaller.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message KONDO Mitsumasa 2014-01-23 12:43:22 Re: Add min and max execute statement time in pg_stat_statement
Previous Message Andres Freund 2014-01-23 10:46:27 Re: Hard limit on WAL space used (because PANIC sucks)