Re: reorder pg_rewind control file sync

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: reorder pg_rewind control file sync
Date: 2019-03-25 09:29:46
Message-ID: alpine.DEB.2.21.1903251013160.6866@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Bonjour Michaël,

>> The attached patch reorders the cluster fsyncing and control file changes in
>> "pg_rewind" so that the later is done after all data are committed to disk,
>> so as to reflect the actual cluster status, similarly to what is done by
>> "pg_checksums", per discussion in the thread about offline enabling of
>> checksums:
>
> It would be an interesting property to see that it is possible to
> retry a rewind of a node which has been partially rewound already,
> but the operation failed in the middle.

Yes. I understand that the question is whether the Warning in pg_rewind
documentation can be partially lifted. The short answer is that it is not
obvious.

> Because that's the real deal here: as long as we know that its control
> file is in its previous state, we can rely on it for retrying the
> operation. Logically, I think that it should work, because we would
> still try to fetch the same blocks from the source server since WAL has
> forked by looking at the records of the target up from the last
> checkpoint before WAL has forked up to the last shutdown checkpoint, and
> the operation is lossy by design when it comes to deal with file
> differences.
>
> Have you tried to see if pg_rewind is able to repeat its operation for
> specific scenarios?

I have run the non regression tests. I'm not sure of what scenarii are
covered there, but probably not an interruption in the middle of a fsync,
specially if fsync is usually disabled to ease the tests:-)

> One is for example a database created on the promoted standby, used as a
> source, and a second, different database created on the primary after
> the standby has been promoted. You could make the tool exit() before
> the rewind finishes, just before updating the control file, and see if
> the operation is repeatable. Interrupting the tool would be fine as
> well, still less controllable.
>
> It would be good to mention in the patch why the order matters.

Yep. This requires a careful analysis of pg_rewind inner working, that I
do not have to do in the short terme.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2019-03-25 10:21:29 Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist
Previous Message Peter Eisentraut 2019-03-25 09:06:32 Re: Reporting script runtimes in pg_regress