Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Alexey Vasiliev <leopard_ne(at)inbox(dot)ru>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
Date: 2015-02-10 13:32:57
Message-ID: CAB7nPqQ_Fm8a-OSEeE6n1nT-LUp7ty85Yc+vWW1v06rwQxC1Hg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 9, 2015 at 8:29 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Sun, Feb 8, 2015 at 2:54 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Fri, Feb 6, 2015 at 4:58 PM, Fujii Masao wrote:
>>> - * Wait for more WAL to arrive. Time out after 5 seconds,
>>> - * like when polling the archive, to react to a trigger
>>> - * file promptly.
>>> + * Wait for more WAL to arrive. Time out after
>>> the amount of
>>> + * time specified by wal_retrieve_retry_interval, like
>>> + * when polling the archive, to react to a
>>> trigger file promptly.
>>> */
>>> WaitLatch(&XLogCtl->recoveryWakeupLatch,
>>> WL_LATCH_SET | WL_TIMEOUT,
>>> - 5000L);
>>> + wal_retrieve_retry_interval * 1000L);
>>>
>>> This change can prevent the startup process from reacting to
>>> a trigger file. Imagine the case where the large interval is set
>>> and the user want to promote the standby by using the trigger file
>>> instead of pg_ctl promote. I think that the sleep time should be 5s
>>> if the interval is set to more than 5s. Thought?
>>
>> I disagree here. It is interesting to accelerate the check of WAL
>> availability from a source in some cases for replication, but the
>> opposite is true as well as mentioned by Alexey at the beginning of
>> the thread to reduce the number of requests when requesting WAL
>> archives from an external storage type AWS. Hence a correct solution
>> would be to check periodically for the trigger file with a maximum
>> one-time wait of 5s to ensure backward-compatible behavior. We could
>> reduce it to 1s or something like that as well.
>
> You seem to have misunderstood the code in question. Or I'm missing something.
> The timeout of the WaitLatch is just the interval to check for the trigger file
> while waiting for more WAL to arrive from streaming replication. Not related to
> the retry time to restore WAL from the archive.

[Re-reading the code...]
Aah.. Yes you are right. Sorry for the noise. Yes let's wait for a
maximum of 5s then. I also noticed in previous patch that the wait was
maximized to 5s. To begin with, a loop should have been used if it was
a sleep, but as now patch uses a latch this limit does not make much
sense... Patch updated is attached.
Regards,
--
Michael

Attachment Content-Type Size
20150210_wal_source_check_v7.patch text/x-patch 7.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-02-10 13:43:22 Re: Assertion failure when streaming logical changes
Previous Message Marc Balmer 2015-02-10 13:32:13 Re: For cursors, there is FETCH and MOVE, why no TELL?