Re: [Bug fix]There is the case archive_timeout parameter is ignored after recovery works.

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, higuchi(dot)daisuke(at)fujitsu(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Bug fix]There is the case archive_timeout parameter is ignored after recovery works.
Date: 2020-06-30 01:01:02
Message-ID: 40f5df69-99a2-5f7c-5432-6f85fa15e9c5@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020/06/30 9:14, Kyotaro Horiguchi wrote:
> Opps! I misunderstood that.
>
> At Mon, 29 Jun 2020 13:00:25 +0000, "higuchi(dot)daisuke(at)fujitsu(dot)com" <higuchi(dot)daisuke(at)fujitsu(dot)com> wrote in
>> Fujii-san, thank you for comments.
>>
>>> The cause of this problem is that the checkpointer's sleep time is calculated
>> >from both checkpoint_timeout and archive_timeout during normal running,
>>> but calculated only from checkpoint_timeout during recovery. So Daisuke-san's
>>> patch tries to change that so that it's calculated from both of them even
>>> during recovery. No?
>>
>> Yes, it's exactly so.
>>
>>> last_xlog_switch_time is not updated during recovery. So "elapsed_secs" can be
>>> large and cur_timeout can be negative. Isn't this problematic?
>>
>> Yes... My patch was missing this.
>
> The patch also makes WaitLatch called with zero timeout, which causes
> assertion failure.
>
>> How about using the original archive_timeout value for calculating cur_timeout during recovery?
>>
>> if (XLogArchiveTimeout > 0 && !RecoveryInProgress())
>> {
>> elapsed_secs = now - last_xlog_switch_time;
>> if (elapsed_secs >= XLogArchiveTimeout)
>> continue; /* no sleep for us ... */
>> cur_timeout = Min(cur_timeout, XLogArchiveTimeout - elapsed_secs);
>> }
>> + else if (XLogArchiveTimeout > 0)
>> + cur_timeout = Min(cur_timeout, XLogArchiveTimeout);
>>
>> During recovery, accurate cur_timeout is not calculated because elapsed_secs is not used.

Yes, that's an idea. But I'm a bit concerned about that this change makes
checkpointer wake up more frequently than necessary during recovery.
Which may increase the idle power consumption of checkpointer during
recovery. Of course, this would not be so problematic in practice because
we can expect that archive_timeout is not so small. But it seems better to
avoid unncessary wake-ups if we can easily do that.

>> However, after recovery is complete, WAL archiving will start by the next archive_timeout is reached.
>> I felt it is enough to solve this problem.
>
> That causes unwanted change of cur_timeout during recovery.
>
>>> As another approach, what about waking the checkpointer up at the end of
>>> recovery like we already do for walsenders?
>
> We don't want change checkpoint interval during recovery, that means
> we cannot cnosider archive_timeout at the fist checkpointer after
> recovery ends. So I think that the suggestion from Fujii-san is the
> direction.

+1
If this idea has some problems, we can revisit Daisuke-san's idea.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2020-06-30 01:08:14 Re: Resetting spilled txn statistics in pg_stat_replication
Previous Message Kyotaro Horiguchi 2020-06-30 00:14:23 Re: [Bug fix]There is the case archive_timeout parameter is ignored after recovery works.