Quick Links

Re: Add timeline to partial WAL segments

From:	David Steele <david(at)pgmasters(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Add timeline to partial WAL segments
Date:	2018-12-14 23:05:18
Message-ID:	255e28de-4cef-2bb1-df43-d629c3f8e351@pgmasters.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 12/14/18 3:26 PM, Robert Haas wrote:
> On Thu, Dec 13, 2018 at 12:17 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>> On Wed, Dec 12, 2018 at 07:54:05AM -0500, David Steele wrote:
>>> The LSN switch point is often the same even when servers are going to
>>> different timelines. If the LSN is different enough then the problem
>>> solves itself since the .partial will be on an entirely different
>>> segment.
>>
>> That would mean that WAL forked exactly at the same record. You have
>> likely seen more cases where than can happen in real life than I do.
>
> Suppose that the original master fails during an idle period, and we
> promote a slave. But we accidentally promote a slave that can't serve
> as the new master, like because it's in a datacenter with an
> unreliable network connection or one which is about to be engulfed in
> lava.

Much more common than people think.

> So, we go to promote a different slave, and because we never
> got around to reconfiguring the standbys to follow the previous
> promotion, kaboom.

Exactly.

> Or, suppose we do PITR to recover from some user error, but then
> somebody screws up the contents of the recovered cluster and we have
> to do it over again. Naturally we'll recover to the same point.
>
> The new TLI is the only thing that is guaranteed to be unique with
> each new promotion, and I would guess that it is therefore the right
> thing to use to disambiguate them.

This is the conclusion we came to after a few months of diagnosing and
working on this problem.

The question in my mind: is it safe to back-patch?

--
-David
david(at)pgmasters(dot)net

In response to

Re: Add timeline to partial WAL segments at 2018-12-14 20:26:19 from Robert Haas

Responses

Re: Add timeline to partial WAL segments at 2018-12-14 23:56:27 from Michael Paquier
Re: Add timeline to partial WAL segments at 2018-12-20 20:56:12 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrew Dunstan	2018-12-14 23:32:01	Re: valgrind issues on Fedora 28
Previous Message	Alexey Bashtanov	2018-12-14 23:04:26	log bind parameter values on error