Re: Add timeline to partial WAL segments

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add timeline to partial WAL segments
Date: 2018-12-13 00:17:12
Message-ID: 20181213001712.GD9437@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 12, 2018 at 07:54:05AM -0500, David Steele wrote:
> The LSN switch point is often the same even when servers are going to
> different timelines. If the LSN is different enough then the problem
> solves itself since the .partial will be on an entirely different
> segment.

That would mean that WAL forked exactly at the same record. You have
likely seen more cases where than can happen in real life than I do.

> But, we could at least use the . notation and end up with something like
> 000000010000000100000001.00000002.partial or perhaps
> 000000010000000100000001.T00000002.partial? Maybe
> 000000010000000100000001.00000002.tpartial?
>
> I can't decide whether the .partial files generated by pg_receivewal are
> a problem or not. It seems to me that only the original cluster is
> going to be able to stream this file -- once the segment is renamed to
> .partial it won't be visible to pg_receivexlog. So, .partial without a
> timeline extension just means partial from the original timeline.

Still this does not actually solve the problem when two servers are
trying to use the same timeline? You would get conflicts with the
history file as well in this case but the partial segment gets archived
first.. It seems to me that it is kind of difficult to come with a
totally bullet-proof solution. Adding the timeline is appealing to use
if the history files can be added on time, the switchpoint LSN is also
appealing as per the likelihood of WAL forking at a different point on a
record basis. Perhaps another solution could be to add both, but that
looks like an overkill.

> There's another difference. The .partial generated by pg_receivewal is
> an actively-worked-on file whereas the .partial generated by a promotion
> is probably headed for oblivion. I haven't see a single case where one
> was used in an actual recovery (which doesn't mean it hasn't happened,
> of course).

There are many people implementing their own backup solutions, it is
hard to say that none of those solutions are actually able to copy the
last partial file to replay up to the end of a wanted timeline for a
PITR.

>> (I am completely sold to the idea of prioritizing file types in the
>> archiver.)
>
> OK, I'll work up a patch for that, then. It doesn't solve the .partial
> problem, but it does reduce the likelihood that two servers will end up
> on the same timeline.

Thanks a lot, David!
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Karlsson 2018-12-13 00:30:06 Re: Introducing SNI in TLS handshake for SSL connections
Previous Message Michael Paquier 2018-12-13 00:06:26 Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name