Re: Review of "pg_basebackup and pg_receivexlog to use non-blocking socket communication", was: Re: Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>, Hari Babu <haribabu(dot)kommi(at)huawei(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review of "pg_basebackup and pg_receivexlog to use non-blocking socket communication", was: Re: Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
Date: 2013-01-21 12:51:44
Message-ID: CABUevEzhwapA-Zc-LKfjPTM=kLCgEt8TY9G5+rh8g5DjoXFuYA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Fri, Jan 18, 2013 at 7:50 AM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
> On Wednesday, January 16, 2013 4:02 PM Heikki Linnakangas wrote:
>> On 07.01.2013 16:23, Boszormenyi Zoltan wrote:
>> > Since my other patch against pg_basebackup is now committed,
>> > this patch doesn't apply cleanly, patch rejects 2 hunks.
>> > The fixed up patch is attached.
>>
>> Now that I look at this a high-level perspective, why are we only
>> worried about timeouts in the Copy-mode and when connecting? The
>> initial
>> checkpoint could take a long time too, and if the server turns into a
>> black hole while the checkpoint is running, pg_basebackup will still
>> hang. Then again, a short timeout on that phase would be a bad idea,
>> because the checkpoint can indeed take a long time.
>
> True, but IMO, if somebody want to take basebackup, he should do that when
> the server is not loaded.

A lot of installations don't have such an optino, because there is no
time whe nthe server is not loaded.

>> In streaming replication, the keep-alive messages carry additional
>> information, the timestamps and WAL locations, so a keepalive makes
>> sense at that level. But otherwise, aren't we just trying to
>> reimplement
>> TCP keepalives? TCP keepalives are not perfect, but if we want to have
>> an application level timeout, it should be implemented in the FE/BE
>> protocol.
>>
>> I don't think we need to do anything specific to pg_basebackup. The
>> user
>> can simply specify TCP keepalive settings in the connection string,
>> like
>> with any libpq program.
>
> I think currently user has no way to specify TCP keepalive settings from
> pg_basebackup, please let me know if there is any such existing way?

You can set it through environment variables. As was discussed
elsewhere, it would be good to have the ability to do it natively to
pg_basebackup as well.

> I think specifying TCP settings is very cumbersome for most users, that's
> the reason most standard interfaces (ODBC/JDBC) have such application level
> timeout mechanism.
>
> By implementing in FE/BE protocol (do you mean to say that make such
> non-blocking behavior inside Libpq or something else), it might be generic
> and can be used for others as well but it might need few interface changes.

If it's specifying them that is cumbersome, then that's the part we
should fix, rather than modifying the protocol, no?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2013-01-21 19:12:53 Re: BUG #7818: Foreign server not visible under normal users
Previous Message abrosich 2013-01-21 12:20:14 BUG #7818: Foreign server not visible under normal users

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2013-01-21 12:53:05 Re: parallel pg_dump
Previous Message Andrew Dunstan 2013-01-21 12:44:48 Re: Visual Studio 2012 RC