Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Alexey Vasiliev <leopard_ne(at)inbox(dot)ru>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
Date: 2015-01-19 16:56:55
Message-ID: 20150119165655.GG23811@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-01-19 17:16:11 +0900, Michael Paquier wrote:
> On Mon, Jan 19, 2015 at 4:10 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
> > On Sat, Jan 17, 2015 at 2:44 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> >> Not this patch's fault, but I'm getting a bit tired seeing the above open
> >> coded. How about adding a function that does the sleeping based on a
> >> timestamptz and a ms interval?
> > You mean in plugins, right? I don't recall seeing similar patterns in
> > other code paths of backend. But I think that we can use something
> > like that in timestamp.c then because we need to leverage that between
> > two timestamps, the last failure and now():
> > TimestampSleepDifference(start_time, stop_time, internal_ms);
> > Perhaps you have something else in mind?
> >
> > Attached is an updated patch.

> Actually I came with better than last patch by using a boolean flag as
> return value of TimestampSleepDifference and use
> TimestampDifferenceExceeds directly inside it.

> Subject: [PATCH] Add wal_availability_check_interval to control WAL fetching
> on failure

I think that name isn't a very good. And its isn't very accurate
either.

How about wal_retrieve_retry_interval? Not very nice, but I think it's
still better than the above.

> + <varlistentry id="wal-availability-check-interval" xreflabel="wal_availability_check_interval">
> + <term><varname>wal_availability_check_interval</varname> (<type>integer</type>)
> + <indexterm>
> + <primary><varname>wal_availability_check_interval</> recovery parameter</primary>
> + </indexterm>
> + </term>
> + <listitem>
> + <para>
> + This parameter specifies the amount of time to wait when
> + WAL is not available for a node in recovery. Default value is
> + <literal>5s</>.
> + </para>
> + <para>
> + A node in recovery will wait for this amount of time if
> + <varname>restore_command</> returns nonzero exit status code when
> + fetching new WAL segment files from archive or when a WAL receiver
> + is not able to fetch a WAL record when using streaming replication.
> + </para>
> + </listitem>
> + </varlistentry>
> +
> </variablelist>

Walreceiver doesn't wait that amount, but rather how long the connection
is intact. And restore_command may or may not retry.

> /*-------
> * Standby mode is implemented by a state machine:
> @@ -10490,15 +10511,13 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
> * machine, so we've exhausted all the options for
> * obtaining the requested WAL. We're going to loop back
> * and retry from the archive, but if it hasn't been long
> - * since last attempt, sleep 5 seconds to avoid
> - * busy-waiting.
> + * since last attempt, sleep the amount of time specified
> + * by wal_availability_check_interval to avoid busy-waiting.
> */
> - now = (pg_time_t) time(NULL);
> - if ((now - last_fail_time) < 5)
> - {
> - pg_usleep(1000000L * (5 - (now - last_fail_time)));
> - now = (pg_time_t) time(NULL);
> - }
> + now = GetCurrentTimestamp();
> + if (TimestampSleepDifference(last_fail_time, now,
> + wal_availability_check_interval))
> + now = GetCurrentTimestamp();

Not bad, that's much easier to read imo.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-01-19 17:05:01 Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?
Previous Message Stephen Frost 2015-01-19 16:47:18 Re: WITH CHECK and Column-Level Privileges