From: | Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> |
---|---|
To: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
Cc: | PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Potential data loss of 2PC files |
Date: | 2016-12-29 09:41:28 |
Message-ID: | CAFjFpRfvnvoGbO7R1i8SE7THUr2VR1avhg=NAV11qJg7iw7R3g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Dec 22, 2016 at 7:00 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> Hi all,
>
> 2PC files are created using RecreateTwoPhaseFile() in two places currently:
> - at replay on a XLOG_XACT_PREPARE record.
> - At checkpoint with CheckPointTwoPhase().
>
> Now RecreateTwoPhaseFile() is careful to call pg_fsync() to be sure
> that the 2PC files find their way into disk. But one piece is missing:
> the parent directory pg_twophase is not fsync'd. At replay this is
> more sensitive if there is a PREPARE record followed by a checkpoint
> record. If there is a power failure after the checkpoint completes
> there is a risk to lose 2PC status files here.
>
> It seems to me that we really should have CheckPointTwoPhase() call
> fsync() on pg_twophase to be sure that no files are lost here. There
> is no point to do this operation in RecreateTwoPhaseFile() as if there
> are many 2PC transactions to replay performance would be impacted, and
> we don't care about the durability of those files until a checkpoint
> moves the redo pointer. I have drafted the patch attached to address
> this issue.
I agree with this.
If no prepared transactions were required to be fsynced
CheckPointTwoPhase(), do we want to still fsync the directory?
Probably not.
May be you want to call fsync_fname(TWOPHASE_DIR, true); if
serialized_xacts > 0.
The comment just states what has been done earlier in the function,
which is documented in the prologue as well.
/*
* Make sure that the content created is persistent on disk to prevent
* data loss in case of an OS crash or a power failure. Each 2PC file
* has been already created and flushed to disk individually by
* RecreateTwoPhaseFile() using the list of GXACTs available for
* normal processing as well as at recovery when replaying individually
* each XLOG_XACT_PREPARE record.
*/
Instead, you may want to just say "If we have flushed any 2PC files,
flush the metadata in the pg_twophase directory."
Although, I thought that a simple case of creating a persistent table
which requires creating a file also would need to flush the directory.
I tried to locate the corresponding code to see if it also uses
fsync_fname(). I couldn't locate the code. I have looked at
mdcreate(), mdpreckpt(), mdpostckpt(). But may be that's not relevant
here.
>
> I am adding that as well to the next CF for consideration.
>
> Thoughts?
> --
> Michael
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2016-12-29 09:46:36 | Re: proposal: session server side variables |
Previous Message | Pavel Stehule | 2016-12-29 09:32:56 | Re: proposal: session server side variables |