Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
Date: 2012-11-08 17:12:14
Message-ID: CAHGQGwGdYJ1tJDHH+FURgaJhRR1kmpvKatpBdmgLsk7ZMhYKPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Thu, Nov 8, 2012 at 5:53 PM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
> On Thursday, November 08, 2012 2:04 PM Heikki Linnakangas wrote:
>> On 19.10.2012 14:42, Amit kapila wrote:
>> > On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote:
>> >> Before implementing the timeout parameter, I think that it's better
>> to change
>> >> both pg_basebackup background process and pg_receivexlog so that they
>> >> send back the reply message immediately when they receive the
>> keepalive
>> >> message requesting the reply. Currently, they always ignore such
>> keepalive
>> >> message, so status interval parameter (-s) in them always must be set
>> to
>> >> the value less than replication timeout. We can avoid this
>> troublesome
>> >> parameter setting by introducing the same logic of walreceiver into
>> both
>> >> pg_basebackup background process and pg_receivexlog.
>> >
>> > Please find the patch attached to address the modification mentioned
>> by you (send immediate reply for keepalive).
>> > Both basebackup and pg_receivexlog uses the same function
>> ReceiveXLogStream, so single change for both will address the issue.
>>
>> Thanks, committed this one after shuffling it around the changes I
>> committed yesterday. I also updated the docs to not claim that -s option
>> is required to avoid timeout disconnects anymore.
>
> Thank you.
> However I think still the issue will not be completely solved.
> pg_basebackup/pg_receivexlog can still take long time to
> detect network break as they don't have timeout concept. To do that I have
> sent one proposal which is mentioned at end of mail chain:
> http://archives.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C3828
> 53BBED(at)szxeml509-mbs
>
> Do you think there is any need to introduce such mechanism in
> pg_basebackup/pg_receivexlog?

Are you planning to introduce the timeout mechanism in pg_basebackup
main process? Or background process? It's useful to implement both.

BTW, IIRC the walsender has no timeout mechanism during sending
backup data to pg_basebackup. So it's also useful to implement the
timeout mechanism for the walsender during backup.

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Sergey 2012-11-08 17:28:49 Re: BUG #7641: ERROR: must specify relation and object name when function contains DROP TRIGGER
Previous Message Fujii Masao 2012-11-08 16:56:41 Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-11-08 17:14:53 Re: Deferrable NOT NULL constraints in 9.3?
Previous Message Robert Haas 2012-11-08 17:10:39 Re: Doc patch, distinguish sections with an empty row in error code table