Re: Switching timeline over streaming replication

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: 'PostgreSQL-development' <pgsql-hackers(at)postgreSQL(dot)org>, 'Thom Brown' <thom(at)linux(dot)com>
Subject: Re: Switching timeline over streaming replication
Date: 2012-12-07 15:51:42
Message-ID: 50C2108E.9020103@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06.12.2012 15:39, Amit Kapila wrote:
> On Thursday, December 06, 2012 12:53 AM Heikki Linnakangas wrote:
>> On 05.12.2012 14:32, Amit Kapila wrote:
>>> On Tuesday, December 04, 2012 10:01 PM Heikki Linnakangas wrote:
>>>> After some diversions to fix bugs and refactor existing code, I've
>>>> committed a couple of small parts of this patch, which just add some
>>>> sanity checks to notice incorrect PITR scenarios. Here's a new
>>>> version of the main patch based on current HEAD.
>>>
>>> After testing with the new patch, the following problems are observed.
>>>
>>> Defect - 1:
>>>
>>> 1. start primary A
>>> 2. start standby B following A
>>> 3. start cascade standby C following B.
>>> 4. start another standby D following C.
>>> 5. Promote standby B.
>>> 6. After successful time line switch in cascade standby C& D,
>> stop D.
>>> 7. Restart D, Startup is successful and connecting to standby C.
>>> 8. Stop C.
>>> 9. Restart C, startup is failing.
>>
>> Ok, the error I get in that scenario is:
>>
>> C 2012-12-05 19:55:43.840 EET 9283 FATAL: requested timeline 2 does not
>> contain minimum recovery point 0/3023F08 on timeline 1 C 2012-12-05
>> 19:55:43.841 EET 9282 LOG: startup process (PID 9283) exited with exit
>> code 1 C 2012-12-05 19:55:43.841 EET 9282 LOG: aborting startup due to
>> startup process failure
>>
>
>>
>> That mismatch causes the error. I'd like to fix this by always treating
>> the checkpoint record to be part of the new timeline. That feels more
>> correct. The most straightforward way to implement that would be to peek
>> at the xlog record before updating replayEndRecPtr and replayEndTLI. If
>> it's a checkpoint record that changes TLI, set replayEndTLI to the new
>> timeline before calling the redo-function. But it's a bit of a
>> modularity violation to peek into the record like that.
>>
>> Or we could just revert the sanity check at beginning of recovery that
>> throws the "requested timeline 2 does not contain minimum recovery point
>> 0/3023F08 on timeline 1" error. The error I added to redo of checkpoint
>> record that says "unexpected timeline ID %u in checkpoint record, before
>> reaching minimum recovery point %X/%X on timeline %u" checks basically
>> the same thing, but at a later stage. However, the way
>> minRecoveryPointTLI is updated still seems wrong to me, so I'd like to
>> fix that.
>>
>> I'm thinking of something like the attached (with some more comments
>> before committing). Thoughts?
>
> This has fixed the problem reported.
> However, I am not able to think will there be any problem if we remove check
> "requested timeline 2 does not contain minimum recovery point
>> 0/3023F08 on timeline 1" at beginning of recovery and just update
> replayEndTLI with ThisTimeLineID?

Well, it seems wrong for the control file to contain a situation like this:

pg_control version number: 932
Catalog version number: 201211281
Database system identifier: 5819228770976387006
Database cluster state: shut down in recovery
pg_control last modified: pe 7. joulukuuta 2012 17.39.57
Latest checkpoint location: 0/3023EA8
Prior checkpoint location: 0/2000060
Latest checkpoint's REDO location: 0/3023EA8
Latest checkpoint's REDO WAL file: 000000020000000000000003
Latest checkpoint's TimeLineID: 2
...
Time of latest checkpoint: pe 7. joulukuuta 2012 17.39.49
Min recovery ending location: 0/3023F08
Min recovery ending loc's timeline: 1

Note the latest checkpoint location and its TimelineID, and compare them
with the min recovery ending location. The min recovery ending location
is ahead of latest checkpoint's location; the min recovery ending
location actually points to the end of the checkpoint record. But how
come the min recovery ending location's timeline is 1, while the
checkpoint record's timeline is 2.

Now maybe that would happen to work if remove the sanity check, but it
still seems horribly confusing. I'm afraid that discrepancy will come
back to haunt us later if we leave it like that. So I'd like to fix that.

Mulling over this for some more, I propose the attached patch. With the
patch, we peek into the checkpoint record, and actually perform the
timeline switch (by changing ThisTimeLineID) before replaying it. That
way the checkpoint record is really considered to be on the new timeline
for all purposes. At the moment, the only difference that makes in
practice is that we set replayEndTLI, and thus minRecoveryPointTLI, to
the new TLI, but it feels logically more correct to do it that way.

- Heikki

Attachment Content-Type Size
fix-minrecoverypointtli-2.patch text/x-diff 5.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2012-12-07 15:55:17 Re: Support for REINDEX CONCURRENTLY
Previous Message Tom Lane 2012-12-07 15:29:22 Re: pg_upgrade problem with invalid indexes