Re: Potential data loss of 2PC files

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Potential data loss of 2PC files
Date: 2016-12-29 09:41:28
Message-ID: CAFjFpRfvnvoGbO7R1i8SE7THUr2VR1avhg=NAV11qJg7iw7R3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 22, 2016 at 7:00 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> Hi all,
>
> 2PC files are created using RecreateTwoPhaseFile() in two places currently:
> - at replay on a XLOG_XACT_PREPARE record.
> - At checkpoint with CheckPointTwoPhase().
>
> Now RecreateTwoPhaseFile() is careful to call pg_fsync() to be sure
> that the 2PC files find their way into disk. But one piece is missing:
> the parent directory pg_twophase is not fsync'd. At replay this is
> more sensitive if there is a PREPARE record followed by a checkpoint
> record. If there is a power failure after the checkpoint completes
> there is a risk to lose 2PC status files here.
>
> It seems to me that we really should have CheckPointTwoPhase() call
> fsync() on pg_twophase to be sure that no files are lost here. There
> is no point to do this operation in RecreateTwoPhaseFile() as if there
> are many 2PC transactions to replay performance would be impacted, and
> we don't care about the durability of those files until a checkpoint
> moves the redo pointer. I have drafted the patch attached to address
> this issue.

I agree with this.
If no prepared transactions were required to be fsynced
CheckPointTwoPhase(), do we want to still fsync the directory?
Probably not.

May be you want to call fsync_fname(TWOPHASE_DIR, true); if
serialized_xacts > 0.

The comment just states what has been done earlier in the function,
which is documented in the prologue as well.
/*
* Make sure that the content created is persistent on disk to prevent
* data loss in case of an OS crash or a power failure. Each 2PC file
* has been already created and flushed to disk individually by
* RecreateTwoPhaseFile() using the list of GXACTs available for
* normal processing as well as at recovery when replaying individually
* each XLOG_XACT_PREPARE record.
*/
Instead, you may want to just say "If we have flushed any 2PC files,
flush the metadata in the pg_twophase directory."

Although, I thought that a simple case of creating a persistent table
which requires creating a file also would need to flush the directory.
I tried to locate the corresponding code to see if it also uses
fsync_fname(). I couldn't locate the code. I have looked at
mdcreate(), mdpreckpt(), mdpostckpt(). But may be that's not relevant
here.

>
> I am adding that as well to the next CF for consideration.
>
> Thoughts?
> --
> Michael
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2016-12-29 09:46:36 Re: proposal: session server side variables
Previous Message Pavel Stehule 2016-12-29 09:32:56 Re: proposal: session server side variables