Re: Tracking latest timeline in standby mode

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Tracking latest timeline in standby mode
Date: 2010-11-02 15:08:44
Message-ID: 4CD0297C.9080004@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02.11.2010 07:15, Fujii Masao wrote:
> On Mon, Nov 1, 2010 at 8:32 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Yeah, that's one approach. Another is to validate the TLI in the xlog page
>> header, it should always match the current timeline we're on. That would
>> feel more robust to me.
>
> Yeah, that seems better.
>
>> We're a bit fuzzy about what TLI is written in the page header when the
>> timeline changing checkpoint record is written, though. If the checkpoint
>> record fits in the previous page, the page will carry the old TLI, but if
>> the checkpoint record begins a new WAL page, the new page is initialized
>> with the new TLI. I think we should rearrange that so that the page header
>> will always carry the old TLI.
>
> Or after rescanning the timeline history files, what about refetching the last
> applied record and checking whether the TLI in the xlog page header is the
> same as the previous TLI? IOW, what about using the header of the xlog page
> including the last applied record instead of the following checkpoint record?

I guess that would work too, but it seems problematic to move backwards
during recovery.

> Anyway ISTM we should also check that the min recovery point is not ahead
> of the TLI switch location. So we need to fetch the record in the min recovery
> point and validate the TLI of the xlog page header. Otherwise, the database
> might get corrupted. This can happen, for example, when you remove all the
> WAL files in pg_xlog directory and restart the standby.

Yes, that's another problem. We don't know which timeline the min
recovery point refers to. We should store TLI along with
minRecoveryPoint, then we can at least check that we're on the right
timeline when we reach minRecoveryPoint and throw an error.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Euler Taveira de Oliveira 2010-11-02 15:40:44 Re: create custom collation from case insensitive portuguese
Previous Message Kevin Grittner 2010-11-02 14:49:18 Re: Starting off with the development