From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
Cc: | Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Replication server timeout patch |
Date: | 2011-03-11 13:18:22 |
Message-ID: | AANLkTimnwxEv-ZbqBLCSBSvmq-80vzvDb2u0pPchGm2r@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Mar 11, 2011 at 8:14 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Mon, Mar 7, 2011 at 8:47 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Sun, Mar 6, 2011 at 11:10 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> On Sun, Mar 6, 2011 at 5:03 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>>> Why does internal_flush_if_writable compute bufptr differently from
>>>>> internal_flush? And shouldn't it be static?
>>>>>
>>>>> It seems to me that this ought to be refactored so that you don't
>>>>> duplicate so much code. Maybe static int internal_flush(bool
>>>>> nonblocking).
>>>>>
>>>>> I don't think that the while (bufptr < bufend) loop needs to contain
>>>>> the code to set and clear the nonblocking state. You could do the
>>>>> whole loop with nonblocking mode turned on and then reenable it just
>>>>> once at the end. Besides possibly being clearer, that would be more
>>>>> efficient and leave less room for unexpected failures.
>>>>
>>>> All these comments seem to make sense. Will fix. Thanks!
>>>
>>> Done. I attached the updated patch.
>>
>> I rebased the patch against current git master.
>
> I added this replication timeout patch into next CF.
>
> I explain why this feature is required for the future review;
>
> Without this feature, walsender might unexpectedly remain for a while when
> the standby crashes or the network outage happens. TCP keepalive can
> improve this situation to a certain extent, but it's not perfect. Remaining
> walsender can cause some problems.
>
> For example, when hot_standby_feedback is enabled, such a remaining
> walsender would prevent oldest xmin from advancing and interfere with
> vacuuming on the master. For example, when you use synchronous
> replication and walsender in SYNC mode gets stuck, any synchronous
> standby candidate cannot switch to SYNC mode until that walsender exits,
> and all the transactions would pause.
>
> This feature causes walsender to exit when there is no reply from the
> standby before the replication timeout expires. Then we can avoid the
> above problems.
I think we should consider making this change for 9.1. This is a real
wart, and it's going to become even more of a problem with sync rep, I
think.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2011-03-11 13:21:52 | Re: Sync Rep v19 |
Previous Message | Fujii Masao | 2011-03-11 13:14:47 | Re: Replication server timeout patch |