Re: pg_basebackup may fail to send feedbacks.

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup may fail to send feedbacks.
Date: 2015-04-01 09:38:31
Message-ID: CAHGQGwHfz-zObLb65LQLFwpPvTvLVHHDHP+vfw4KvHw2q_v9fg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 10, 2015 at 5:29 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hi, the attached is the v5 patch.
>
> - Do feGetCurrentTimestamp() only when necessary.
> - Rebased to current master
>
>
> At Mon, 2 Mar 2015 20:21:36 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwG1tJHpG03oZgwoKxt5wYD5v4S3HuTgSx7RotBhHnjwJw(at)mail(dot)gmail(dot)com>
>> On Tue, Feb 24, 2015 at 6:44 PM, Kyotaro HORIGUCHI
>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> > Hello, the attached is the v4 patch that checks feedback timing
>> > every WAL segments boundary.
> ..
>> > I said that checking whether to send feedback every boundary
>> > between WAL segments seemed too long but after some rethinking, I
>> > changed my mind.
>> >
>> > - The most large possible delay source in the busy-receive loop
>> > is fsyncing at closing WAL segment file just written, this can
>> > take several seconds. Freezing longer than the timeout
>> > interval could not be saved and is not worth saving anyway.
>> >
>> > - 16M bytes-disk-writes intervals between gettimeofday() seems
>> > to be gentle enough even on platforms where gettimeofday() is
>> > rather heavy.
>>
>> Sounds reasonable to me.
>>
>> So we don't need to address the problem in walreceiver side so proactively
>> because it tries to send the feedback every after flushing the WAL records.
>> IOW, the problem you observed is less likely to happen. Right?
>>
>> + now = feGetCurrentTimestamp();
>> + if (standby_message_timeout > 0 &&
>
> Surely it would hardly happen, especially on ordinary
> configuration.
>
> However, the continuous receiving of the replication stream is a
> quite normal behavior even if hardly happens. So the receiver
> should guarantee the feedbacks to be sent by logic as long as it
> is working normally, as long as the code for the special case
> won't be too large and won't take too long time:).
>
> The current walreceiver doesn't look trying to send feedbacks
> after flushing every wal records. HandleCopyStream will
> restlessly process the records in a gapless replication stream,
> sending feedback only when keepalive packet arrives. It is the
> fact that the response to the keepalive would help greatly but it
> is not what the keepalives are for. It is intended to be used to
> confirm if a silent receiver is still alive.
>
> Even with this fix, the case that one write or flush takes longer
> time than the feedback interval cannot be saved, but it would be
> ok since it should be regarded as disorder.
>
>
>> Minor comment: should feGetCurrentTimestamp() be called after the check of
>> standby_message_timeout > 0, to avoid unnecessary calls of that?
>
> Ah, you're right. I'll fixed it.

Even if the timeout has not elapsed yet, if synchronous mode is true, we should
send feedback here?

Regards,

--
Fujii Masao

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jehan-Guillaume de Rorthais 2015-04-01 10:00:51 Re: Maximum number of WAL files in the pg_xlog directory
Previous Message Kyotaro HORIGUCHI 2015-04-01 08:01:40 Re: vac truncation scan problems