Re: incorrect handling of the timeout in pg_receivexlog

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: incorrect handling of the timeout in pg_receivexlog
Date: 2012-02-07 09:31:19
Message-ID: 4F30EF67.4030405@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 07.02.2012 09:03, Fujii Masao wrote:
> On Tue, Feb 7, 2012 at 2:58 PM, Fujii Masao<masao(dot)fujii(at)gmail(dot)com> wrote:
>> When I compiled HEAD with --disable-integer-datetimes and tested
>> pg_receivexlog, I encountered unexpected replication timeout. As
>> far as I read the pg_receivexlog code, the cause of this problem is
>> that pg_receivexlog handles the standby message timeout incorrectly
>> in --disable-integer-datetimes. The attached patch fixes this problem.
>> Comments?
>
> receivelog.c
> -------
> timeout.tv_sec = last_status + standby_message_timeout - now - 1;
> if (timeout.tv_sec<= 0)
> -------
>
> Umm.. the above code also handles the timestamp incorrectly. ISTM that the
> root cause of these problems is that receivelog.c uses TimestampTz.

Yep. While localGetCurrentTimestamp() returns a TimestampTz and handles
float timestamps correctly, the caller just assigns the result to a
int64 variable, assuming --enable-integer-datetimes.

> What about changing receivelog.c so that it uses time_t instead of
> TimestampTz? Which would make the code simpler, I think.

Hmm, that would reduce granularity to seconds. The --statusint option is
given in seconds, but it would be good to have more precision in the
calculations to avoid rounding errors.

But actually, if the purpose of the --statusint option is to avoid
disconnection because of exceeding the server's replication_timeout, one
second granularity just isn't enough to be begin with.
replication_timeout is given in milliseconds, so if you set
replication_timeout=900ms in the server, there is no way to make
pg_basebackup/pg_receivexlog to reply in time.

So, --statusint needs to be in milliseconds. And while we're at it, how
difficult would be to ask the server for the current value of
replication_timeout, and set --statusint automatically based on that? Or
perhaps mark replication_timeout as GUC_REPORT. It is rather fiddly that
depending on a server setting, you need to pass an option in the client
or it will just silently fail with no indication of what the problem is.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2012-02-07 09:35:23 Re: incorrect handling of the timeout in pg_receivexlog
Previous Message Amit Kapila 2012-02-07 09:08:49 Re: double writes using "double-write buffer" approach [WIP]