Re: Switching timeline over streaming replication

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thom Brown <thom(at)linux(dot)com>
Subject: Re: Switching timeline over streaming replication
Date: 2012-12-20 16:48:35
Message-ID: CAHGQGwHR790c5j7SSJwqjPfjRnWvMfrGPgpXT=Gxv4ghxGL9_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Dec 15, 2012 at 9:36 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Sat, Dec 8, 2012 at 12:51 AM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
>> On 06.12.2012 15:39, Amit Kapila wrote:
>>>
>>> On Thursday, December 06, 2012 12:53 AM Heikki Linnakangas wrote:
>>>>
>>>> On 05.12.2012 14:32, Amit Kapila wrote:
>>>>>
>>>>> On Tuesday, December 04, 2012 10:01 PM Heikki Linnakangas wrote:
>>>>>>
>>>>>> After some diversions to fix bugs and refactor existing code, I've
>>>>>> committed a couple of small parts of this patch, which just add some
>>>>>> sanity checks to notice incorrect PITR scenarios. Here's a new
>>>>>> version of the main patch based on current HEAD.
>>>>>
>>>>>
>>>>> After testing with the new patch, the following problems are observed.
>>>>>
>>>>> Defect - 1:
>>>>>
>>>>> 1. start primary A
>>>>> 2. start standby B following A
>>>>> 3. start cascade standby C following B.
>>>>> 4. start another standby D following C.
>>>>> 5. Promote standby B.
>>>>> 6. After successful time line switch in cascade standby C& D,
>>>>
>>>> stop D.
>>>>>
>>>>> 7. Restart D, Startup is successful and connecting to standby C.
>>>>> 8. Stop C.
>>>>> 9. Restart C, startup is failing.
>>>>
>>>>
>>>> Ok, the error I get in that scenario is:
>>>>
>>>> C 2012-12-05 19:55:43.840 EET 9283 FATAL: requested timeline 2 does not
>>>> contain minimum recovery point 0/3023F08 on timeline 1 C 2012-12-05
>>>> 19:55:43.841 EET 9282 LOG: startup process (PID 9283) exited with exit
>>>> code 1 C 2012-12-05 19:55:43.841 EET 9282 LOG: aborting startup due to
>>>> startup process failure
>>>>
>>>
>>>>
>>>> That mismatch causes the error. I'd like to fix this by always treating
>>>> the checkpoint record to be part of the new timeline. That feels more
>>>> correct. The most straightforward way to implement that would be to peek
>>>> at the xlog record before updating replayEndRecPtr and replayEndTLI. If
>>>> it's a checkpoint record that changes TLI, set replayEndTLI to the new
>>>> timeline before calling the redo-function. But it's a bit of a
>>>> modularity violation to peek into the record like that.
>>>>
>>>> Or we could just revert the sanity check at beginning of recovery that
>>>> throws the "requested timeline 2 does not contain minimum recovery point
>>>> 0/3023F08 on timeline 1" error. The error I added to redo of checkpoint
>>>> record that says "unexpected timeline ID %u in checkpoint record, before
>>>> reaching minimum recovery point %X/%X on timeline %u" checks basically
>>>> the same thing, but at a later stage. However, the way
>>>> minRecoveryPointTLI is updated still seems wrong to me, so I'd like to
>>>> fix that.
>>>>
>>>> I'm thinking of something like the attached (with some more comments
>>>> before committing). Thoughts?
>>>
>>>
>>> This has fixed the problem reported.
>>> However, I am not able to think will there be any problem if we remove
>>> check
>>> "requested timeline 2 does not contain minimum recovery point
>>>>
>>>> 0/3023F08 on timeline 1" at beginning of recovery and just update
>>>
>>> replayEndTLI with ThisTimeLineID?
>>
>>
>> Well, it seems wrong for the control file to contain a situation like this:
>>
>> pg_control version number: 932
>> Catalog version number: 201211281
>> Database system identifier: 5819228770976387006
>> Database cluster state: shut down in recovery
>> pg_control last modified: pe 7. joulukuuta 2012 17.39.57
>> Latest checkpoint location: 0/3023EA8
>> Prior checkpoint location: 0/2000060
>> Latest checkpoint's REDO location: 0/3023EA8
>> Latest checkpoint's REDO WAL file: 000000020000000000000003
>> Latest checkpoint's TimeLineID: 2
>> ...
>> Time of latest checkpoint: pe 7. joulukuuta 2012 17.39.49
>> Min recovery ending location: 0/3023F08
>> Min recovery ending loc's timeline: 1
>>
>> Note the latest checkpoint location and its TimelineID, and compare them
>> with the min recovery ending location. The min recovery ending location is
>> ahead of latest checkpoint's location; the min recovery ending location
>> actually points to the end of the checkpoint record. But how come the min
>> recovery ending location's timeline is 1, while the checkpoint record's
>> timeline is 2.
>>
>> Now maybe that would happen to work if remove the sanity check, but it still
>> seems horribly confusing. I'm afraid that discrepancy will come back to
>> haunt us later if we leave it like that. So I'd like to fix that.
>>
>> Mulling over this for some more, I propose the attached patch. With the
>> patch, we peek into the checkpoint record, and actually perform the timeline
>> switch (by changing ThisTimeLineID) before replaying it. That way the
>> checkpoint record is really considered to be on the new timeline for all
>> purposes. At the moment, the only difference that makes in practice is that
>> we set replayEndTLI, and thus minRecoveryPointTLI, to the new TLI, but it
>> feels logically more correct to do it that way.
>
> This patch has already been included in HEAD. Right?
>
> I found another "requested timeline does not contain minimum recovery point"
> error scenario in HEAD:
>
> 1. Set up the master 'M', one standby 'S1', and one cascade standby 'S2'.
> 2. Shutdown the master 'M' and promote the standby 'S1', and wait for 'S2'
> to reconnect to 'S1'.
> 3. Set up new cascade standby 'S3' connecting to 'S2'.
> Then 'S3' fails to start the recovery because of the following error:
>
> FATAL: requested timeline 2 does not contain minimum recovery
> point 0/3000000 on timeline 1
> LOG: startup process (PID 33104) exited with exit code 1
> LOG: aborting startup due to startup process failure
>
> The result of pg_controldata of 'S3' is:
>
> Latest checkpoint location: 0/3000088
> Prior checkpoint location: 0/2000060
> Latest checkpoint's REDO location: 0/3000088
> Latest checkpoint's REDO WAL file: 000000020000000000000003
> Latest checkpoint's TimeLineID: 2
> <snip>
> Min recovery ending location: 0/3000000
> Min recovery ending loc's timeline: 1
> Backup start location: 0/0
> Backup end location: 0/0
>
> The content of the timeline history file '00000002.history' is:
>
> 1 0/3000088 no recovery target specified

I still could reproduce this problem. Attached is the shell script
which reproduces the problem.

Regards,

--
Fujii Masao

Attachment Content-Type Size
fujii_test.sh application/x-sh 1.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2012-12-20 16:49:30 Re: Set visibility map bit after HOT prune
Previous Message Tom Lane 2012-12-20 16:46:32 Re: ALTER .. OWNER TO error mislabels schema as other object type