Re: Switching timeline over streaming replication

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thom Brown <thom(at)linux(dot)com>
Subject: Re: Switching timeline over streaming replication
Date: 2012-12-15 00:36:19
Message-ID: CAHGQGwFNsGzwBPu7V=tz0DghtQ3F298eaY4BRbUX_V7hZH2Qbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Dec 8, 2012 at 12:51 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> On 06.12.2012 15:39, Amit Kapila wrote:
>>
>> On Thursday, December 06, 2012 12:53 AM Heikki Linnakangas wrote:
>>>
>>> On 05.12.2012 14:32, Amit Kapila wrote:
>>>>
>>>> On Tuesday, December 04, 2012 10:01 PM Heikki Linnakangas wrote:
>>>>>
>>>>> After some diversions to fix bugs and refactor existing code, I've
>>>>> committed a couple of small parts of this patch, which just add some
>>>>> sanity checks to notice incorrect PITR scenarios. Here's a new
>>>>> version of the main patch based on current HEAD.
>>>>
>>>>
>>>> After testing with the new patch, the following problems are observed.
>>>>
>>>> Defect - 1:
>>>>
>>>> 1. start primary A
>>>> 2. start standby B following A
>>>> 3. start cascade standby C following B.
>>>> 4. start another standby D following C.
>>>> 5. Promote standby B.
>>>> 6. After successful time line switch in cascade standby C& D,
>>>
>>> stop D.
>>>>
>>>> 7. Restart D, Startup is successful and connecting to standby C.
>>>> 8. Stop C.
>>>> 9. Restart C, startup is failing.
>>>
>>>
>>> Ok, the error I get in that scenario is:
>>>
>>> C 2012-12-05 19:55:43.840 EET 9283 FATAL: requested timeline 2 does not
>>> contain minimum recovery point 0/3023F08 on timeline 1 C 2012-12-05
>>> 19:55:43.841 EET 9282 LOG: startup process (PID 9283) exited with exit
>>> code 1 C 2012-12-05 19:55:43.841 EET 9282 LOG: aborting startup due to
>>> startup process failure
>>>
>>
>>>
>>> That mismatch causes the error. I'd like to fix this by always treating
>>> the checkpoint record to be part of the new timeline. That feels more
>>> correct. The most straightforward way to implement that would be to peek
>>> at the xlog record before updating replayEndRecPtr and replayEndTLI. If
>>> it's a checkpoint record that changes TLI, set replayEndTLI to the new
>>> timeline before calling the redo-function. But it's a bit of a
>>> modularity violation to peek into the record like that.
>>>
>>> Or we could just revert the sanity check at beginning of recovery that
>>> throws the "requested timeline 2 does not contain minimum recovery point
>>> 0/3023F08 on timeline 1" error. The error I added to redo of checkpoint
>>> record that says "unexpected timeline ID %u in checkpoint record, before
>>> reaching minimum recovery point %X/%X on timeline %u" checks basically
>>> the same thing, but at a later stage. However, the way
>>> minRecoveryPointTLI is updated still seems wrong to me, so I'd like to
>>> fix that.
>>>
>>> I'm thinking of something like the attached (with some more comments
>>> before committing). Thoughts?
>>
>>
>> This has fixed the problem reported.
>> However, I am not able to think will there be any problem if we remove
>> check
>> "requested timeline 2 does not contain minimum recovery point
>>>
>>> 0/3023F08 on timeline 1" at beginning of recovery and just update
>>
>> replayEndTLI with ThisTimeLineID?
>
>
> Well, it seems wrong for the control file to contain a situation like this:
>
> pg_control version number: 932
> Catalog version number: 201211281
> Database system identifier: 5819228770976387006
> Database cluster state: shut down in recovery
> pg_control last modified: pe 7. joulukuuta 2012 17.39.57
> Latest checkpoint location: 0/3023EA8
> Prior checkpoint location: 0/2000060
> Latest checkpoint's REDO location: 0/3023EA8
> Latest checkpoint's REDO WAL file: 000000020000000000000003
> Latest checkpoint's TimeLineID: 2
> ...
> Time of latest checkpoint: pe 7. joulukuuta 2012 17.39.49
> Min recovery ending location: 0/3023F08
> Min recovery ending loc's timeline: 1
>
> Note the latest checkpoint location and its TimelineID, and compare them
> with the min recovery ending location. The min recovery ending location is
> ahead of latest checkpoint's location; the min recovery ending location
> actually points to the end of the checkpoint record. But how come the min
> recovery ending location's timeline is 1, while the checkpoint record's
> timeline is 2.
>
> Now maybe that would happen to work if remove the sanity check, but it still
> seems horribly confusing. I'm afraid that discrepancy will come back to
> haunt us later if we leave it like that. So I'd like to fix that.
>
> Mulling over this for some more, I propose the attached patch. With the
> patch, we peek into the checkpoint record, and actually perform the timeline
> switch (by changing ThisTimeLineID) before replaying it. That way the
> checkpoint record is really considered to be on the new timeline for all
> purposes. At the moment, the only difference that makes in practice is that
> we set replayEndTLI, and thus minRecoveryPointTLI, to the new TLI, but it
> feels logically more correct to do it that way.

This patch has already been included in HEAD. Right?

I found another "requested timeline does not contain minimum recovery point"
error scenario in HEAD:

1. Set up the master 'M', one standby 'S1', and one cascade standby 'S2'.
2. Shutdown the master 'M' and promote the standby 'S1', and wait for 'S2'
to reconnect to 'S1'.
3. Set up new cascade standby 'S3' connecting to 'S2'.
Then 'S3' fails to start the recovery because of the following error:

FATAL: requested timeline 2 does not contain minimum recovery
point 0/3000000 on timeline 1
LOG: startup process (PID 33104) exited with exit code 1
LOG: aborting startup due to startup process failure

The result of pg_controldata of 'S3' is:

Latest checkpoint location: 0/3000088
Prior checkpoint location: 0/2000060
Latest checkpoint's REDO location: 0/3000088
Latest checkpoint's REDO WAL file: 000000020000000000000003
Latest checkpoint's TimeLineID: 2
<snip>
Min recovery ending location: 0/3000000
Min recovery ending loc's timeline: 1
Backup start location: 0/0
Backup end location: 0/0

The content of the timeline history file '00000002.history' is:

1 0/3000088 no recovery target specified

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2012-12-15 05:46:43 Re: Doc patch to note which system catalogs have oids
Previous Message Andres Freund 2012-12-15 00:19:26 Re: logical decoding - GetOldestXmin