Re: Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Fujii Masao'" <masao(dot)fujii(at)gmail(dot)com>
Cc: "'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
Date: 2012-11-16 11:40:03
Message-ID: 00ad01cdc3ef$1ede2f10$5c9a8d30$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Thursday, November 15, 2012 7:29 PM Amit kapila wrote:
> On Monday, November 12, 2012 8:23 PM Fujii Masao wrote:
> On Fri, Nov 9, 2012 at 3:03 PM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
> wrote:
> > On Thursday, November 08, 2012 10:42 PM Fujii Masao wrote:
> >> On Thu, Nov 8, 2012 at 5:53 PM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
> >> wrote:
> >> > On Thursday, November 08, 2012 2:04 PM Heikki Linnakangas wrote:
> >> >> On 19.10.2012 14:42, Amit kapila wrote:
> >> >> > On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote:
>
> >>> Are you planning to introduce the timeout mechanism in pg_basebackup
> I feel apart from above, remaining problem is for function call
> PQgetResult() 1. Wherever query is getting sent from BaseBackup, it
> calls the function PQgetResult to receive the result of query.
> As PQgetResult() is blocking function (it calls pqWait which can
> hang), so if network is down before sending the query itself,
> then there will not be any result, so it will keep hanging in
> PQgetResult .
> IMO, it can be solved in below ways:
> a. Create one corresponding non-blocking function. But this function is
> being called from inside some of the
> other libpq function (PQexec->PQexecFinish->PQgetResult). So it can
> be little tricky to solve this way.
> b. Add the receive_timeout variable in PGconn structure and use it in
> pqWait for timeout whenever it is set.
> c. any other better way?
>
>
> >> BTW, IIRC the walsender has no timeout mechanism during sending
> >> backup data to pg_basebackup. So it's also useful to implement the
> >> timeout mechanism for the walsender during backup.
> >
>
> >What about using pq_putmessage_noblock()?
>
> I think may be some more functions also needs to be made as noblock. I
> am still evaluating.

Done the analysis and seems that for below API's also, we need equivalent
noblock, otherwise same problem can happen as they are also
used in the flow.
a. pq_endmessage
b. EndCommand
c. pq_puttextmessage
d. pq_putemptymessage
e. ReadyForQuery - For this, because now walsender and normal
backend are same.
f. ReadCommand - For this, because now walsender and normal backend
are same. It seems solution for it can be tricky as pq_getbyte is not called
from first level function.

Suggestions/Thoughts?

With Regards,
Amit Kapila.

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message dato0011 2012-11-16 13:21:27 BUG #7665: Query planner generating incorrect query plan
Previous Message Chen Huajun 2012-11-16 08:02:18 Re: BUG #7664: Program using libpq and ecpglib can not output native language

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2012-11-16 13:13:43 Re: Materialized views WIP patch
Previous Message Amit kapila 2012-11-16 11:22:14 Re: [PATCH] Patch to compute Max LSN of Data Pages