Re: [Skytools-users] WAL Shipping + checkpoint

From: Sébastien Lardière <slardiere(at)hi-media(dot)com>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: pgsql-general(at)postgresql(dot)org, skytools-users(at)pgfoundry(dot)org
Subject: Re: [Skytools-users] WAL Shipping + checkpoint
Date: 2009-08-27 12:08:04
Message-ID: 4A967724.7030704@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 27/08/2009 00:18, Mark Kirkwood wrote:
> Sébastien Lardière wrote:
>> On 26/08/2009 04:46, Mark Kirkwood wrote:
>>> Sébastien Lardière wrote:
>>>> Hi All,
>>>>
>>>> I've a cluster ( Pg 8.3.7 ) with WAL Shipping, and a few hours ago,
>>>> the master had to restart.
>>>>
>>>> I use walmgr from Skytools, which works very well.
>>>>
>>>> I have already restart the master without any problem, but today,
>>>> the slave doesn't work like I want. The field "Time of latest
>>>> checkpoint" from the pg_controldata on the slave keep the same
>>>> values, but WAL File are processed correctly.
>>>>
>>>> I try to restart the slave, but, after processed again all the WAL
>>>> between "Time of latest checkpoint" and, it does nothing else,
>>>> latest checkpoint stay at the same value.
>>>>
>>>> I don't know if it's important ( i think so ), and I can't fix it.
>>>>
>>> It is normal for it to lag behind somewhat on the slave (depending
>>> on what your checkpoint timeout etc settings are).
>>>
>>> However, I've noticed what you are seeing as well - particularly
>>> when there are no actual data changes coming through in the logs -
>>> the slave checkpoint time does not change even tho there have been
>>> checkpoints on the master (I may have a look in the code to see what
>>> the story really is...if I have time).
>>>
>>
>> Yes, but the delay between the last checkpoint on the master and the
>> slave is very high, now ( 100 000 sec ), because the last checkpoint
>> on the slave was yesterday ( as far as pg_controldata is right )
>>
>> Here a graph from our munin plugin :
>> http://seb.ouvaton.org/tmp/bdd-pg_walmgr-week.png
>>
>> The blue line represent an average between two WAL processed on the
>> slave, and the green line, the delai between last checkpoint on the
>> master and the slave.
>>
>> Maybe it's not some good indicator, but the green line let me think
>> there is problem.
>>
>>
> Do you have archive_timeout set? If so, then what *could* be happening
> is this:
>
> There are actually no "real" data changes being made on your master
> for some reason. So every time archive_timeout is reached a log full
> of no changes is shipped to your slave and applied - and no checkpoint
> times are changed for reasons I mentioned above.
>
>

thanks, but we have not set archive_timeout, and we have a lot of real
data changes.

That's why i don't understand why checkpoint never happen on the slave.

--
Sébastien Lardière

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tguru 2009-08-27 12:29:58 Re: ETL software and training
Previous Message Sualeh Fatehi 2009-08-27 11:09:40 Re: Schema diff tool?