Re: Potential data loss of 2PC files

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Potential data loss of 2PC files
Date: 2017-01-03 11:41:11
Message-ID: CAFjFpRf9WLZoeYGP+Ovhob7W_VP8F4VwTF7sXGb9Kz+t7YXYSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 3, 2017 at 2:50 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Tue, Jan 3, 2017 at 3:32 PM, Ashutosh Bapat
> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>> I am wondering what happens if a 2PC file gets created, at the time of
>> checkpoint we flush the pg_twophase directory, then the file gets
>> removed. Do we need to flush the directory to ensure that the removal
>> persists? Whatever material I look for fsync() on directory, it gives
>> examples of file creation, not that of deleting a file. If we want to
>> persist the removal, probably we have to flush pg_twophase always or
>> add code to track whether any activity happened in pg_twophase between
>> two checkpoints. The later seems complication not worth the benefit.
>
> There is already the delay checkpoint machinery to cover timing
> problems here. Have a look for example at EndPrepare()@twophase.c.

Are you talking about
/*
* Now we can mark ourselves as out of the commit critical section: a
* checkpoint starting after this will certainly see the gxact as a
* candidate for fsyncing.
*/
MyPgXact->delayChkpt = false;

That's while creating the file. I do not see similar code in
FinishPreparedTransaction() where the 2PC file is removed.

>
>> I guess, it's hard to construct a case to reproduce the issue
>> described in your first mail. But still checking if you have any
>> reproduction. May be we could use similar reproduction to test the
>> deletion of two phase file.
>
> Not really. You can just do the test on a VM (one transaction
> generating a 2PC file, followed by a checkpoint), then kill-9 its
> parent instance. That's radical to emulate the power loss. I do that
> on macos with VMware Fusion.
> --
> Michael

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-01-03 11:56:31 Re: [PATCH] Rename pg_switch_xlog to pg_switch_wal
Previous Message Michael Paquier 2017-01-03 11:41:03 Re: Commit fest 2017-01 will begin soon!