Re: END_OF_RECOVERY shutdowns and ResetUnloggedRelations()

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: END_OF_RECOVERY shutdowns and ResetUnloggedRelations()
Date: 2014-09-25 20:41:18
Message-ID: 20140925204118.GE16581@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2014-09-24 17:06:05 +0530, Abhijit Menon-Sen wrote:
> Hi Andres, Robert.
>
> I've attached four patches here.

Cool. Will review.

> 1. Move the call to ResetUnloggedRelations(UNLOGGED_RELATION_INIT) to
> earlier in StartupXLOG.
>
> 2. Inside that function, issue fsync()s for the main forks we create by
> copying the _init fork.

These two imo should definitely be backpatched.

> 3. A small fixup to add a const to "typedef char *FileName", because the
> earlier patch gave me warnings about discarding const-ness. This is
> consistent with many other functions in fd.c that take const char *.

I'm happy with consider that one for master (although I seem to recall having had
a patch for it rejected?), but I don't think we want to do that in the
backbranches.

> 4. Issue an fsync() on the data directory at startup if we need to
> perform crash recovery.

I personally think this one should be backpatched too. Does anyone
believe differently?

> From d8726c06cdf11674661eac1d091cf7edd05c2a0c Mon Sep 17 00:00:00 2001
> From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
> Date: Wed, 24 Sep 2014 14:43:18 +0530
> Subject: Call ResetUnloggedRelations(UNLOGGED_RELATION_INIT) earlier
>
> We need to call this after recovery, but not after the END_OF_RECOVERY
> checkpoint is written. If we crash after that checkpoint, for example
> because of ENOSPC in PreallocXlogFiles, the checkpoint has already set
> the ControlFile->state to DB_SHUTDOWNED, so we don't enter recovery
> again at startup.
>
> Because we did call ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP)
> in the first cleanup, this leaves the database with a bunch of _init
> forks for unlogged relations, but no main forks, leading to scary
> errors.
>
> See thread from 20140912112246(dot)GA4984(at)alap3(dot)anarazel(dot)de for details.

With a explanation. Shiny!

> ---
> src/backend/access/transam/xlog.c | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
> index 46eef5f..218f7fb 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -6863,6 +6863,14 @@ StartupXLOG(void)
> ShutdownWalRcv();
>
> /*
> + * Reset initial contents of unlogged relations. This has to be done
> + * AFTER recovery is complete so that any unlogged relations created
> + * during recovery also get picked up.
> + */
> + if (InRecovery)
> + ResetUnloggedRelations(UNLOGGED_RELATION_INIT);
> +

+

* Also recovery shouldn't be regarded successful if the reset fails -
* e.g. because of ENOSPC.

> +
> + /*
> + * copy_file() above has already called pg_flush_data() on the
> + * files it created. Now we need to fsync those files, because
> + * a checkpoint won't do it for us while we're in recovery.
> + */

+

* Doing this in a separate pass is advantageous for performance reasons
* because it allows the kernel to perform all the flushes at once.

> + /*
> + * If we need to perform crash recovery, we issue an fsync on the
> + * data directory to try to ensure that any data written before the
> + * crash are flushed to disk. Otherwise a power failure in the near
> + * future might mean that earlier unflushed writes are lost, but the
> + * more recent data written to disk from here on are persisted.
> + */
> +
> + if (ControlFile->state != DB_SHUTDOWNED &&
> + ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
> + fsync_fname(data_directory, true);
> +
> if (ControlFile->state == DB_SHUTDOWNED)
> {
> /* This is the expected case, so don't be chatty in standalone mode */
> --

Unless I miss something this isn't sufficient. We need to fsync the
files in the data directory, not just the toplevel directory?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2014-09-25 20:48:31 Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Previous Message Robert Haas 2014-09-25 20:36:23 Re: B-Tree support function number 3 (strxfrm() optimization)