Re: [BUGS] replication_timeout not effective

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Dang Minh Huong <kakalot49(at)gmail(dot)com>
Cc: Kyotaro HORIGUCHI <kyota(dot)horiguchi(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, "<pgsql-bugs(at)postgresql(dot)org>" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: [BUGS] replication_timeout not effective
Date: 2013-04-10 14:44:02
Message-ID: 20130410144402.GD15043@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On 2013-04-10 23:37:44 +0900, Dang Minh Huong wrote:
> Thanks all,
>
> (2013/04/10 22:55), Andres Freund wrote:
> >On 2013-04-10 22:38:07 +0900, Kyotaro HORIGUCHI wrote:
> >>Hello,
> >>
> >>On Wed, Apr 10, 2013 at 6:57 PM, Dang Minh Huong <kakalot49(at)gmail(dot)com> wrote:
> >>>In 9.3, it sounds replication_timeout is replaced by wal_sender_timeout.
> >>>So if it is solved in 9.3 i think there is a way to terminate it.
> >>>I hope it is fixed in 9.1 soon
> >>Hmm. He said that,
> >>
> >>>But in my environment the sender process is hang up (in several tens of minunites) if i turn off (by power off) Standby PC while *pg_basebackup* is excuting.
> >>Does basebackup run only on 'replication connection' ?
> >>As far as I saw base backup uses 'base backup' connection in addition
> >>to 'streaming' connection. The former seems not under the control of
> >>wal_sender_timeout or replication_timeout and easily blocked at
> >>send(2) after sudden cut out of the network connection underneath.
> >>Although the latter indeed is terminated by them.
> >Yes, it's run via a walsender connection. The only "problem" is that it
> >doesn't check for those timeouts. I am not sure it would be a good thing
> >to do so to be honest. At least not using the same timeout as actual WAL
> >sending, thats just has different characteristics.
> >On the other hand, hanging around that long isn't nice either...
> I tried max_wal_sender with 1, so when the walsender is hanging.
> I can not run again pg_basebackup (or start the standby DB).
> I'm increasing it to 2, so the seconds successfully. But i'm afraid
> that when the third occures the hanging walsender in the first
> is not yet terminated...
>
> I think not, but is there a way to terminate hanging up but not
> restart PostgreSQL server or kill walsender process?
> (kill walsender process can caused a crash to DB server,
> so i don't want to do it).

Depending on where its hanging a normal SELECT
pg_terminate_backend(pid); might do it.

Otherwise you will have to wait for the operating system's tcp timeout.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Dang Minh Huong 2013-04-10 14:56:27 Re: replication_timeout not effective
Previous Message Dang Minh Huong 2013-04-10 14:37:44 Re: replication_timeout not effective

Browse pgsql-hackers by date

  From Date Subject
Next Message 2013-04-10 14:54:03 [GSOC] questions about idea "rewrite pg_dump as library"
Previous Message Dang Minh Huong 2013-04-10 14:37:44 Re: replication_timeout not effective