Re: Auto-vacuum is not running in 9.1.12

From: Prakash Itnal <prakash074(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, rasna(dot)t(at)nokia(dot)com, sandhya(dot)k_s(at)nokia(dot)com, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Auto-vacuum is not running in 9.1.12
Date: 2015-06-22 17:35:38
Message-ID: CAHC5u7-1WvP+WQeyJdvbiR62NZeDEkhkzGraM=a94Mai8zoSiA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tom/Alvaro,

Kindly let us know if the correction provided in previous mail is fine or
not! Current code any way handle scenario-1 whereas it is still vulnerable
to scenario-2.

From previous mail:
*Scenario-1:* current_time (2015) -> changed_to_past (1995) ->
stays-here-for-half-day -> corrected to current_time (2015)
*Scenario-2:* current_time (2015) -> changed_to_future (2020) ->
stays-here-for-half-day -> corrected to current_time (2015)

We are waiting for your response.

On Sun, Jun 21, 2015 at 2:56 PM, Prakash Itnal <prakash074(at)gmail(dot)com> wrote:

> Hi,
>
> To my understanding it will probably not open doors for worst situations!
> Please correct if my below understanding is correct.
>
> The latch will wake up under below three situations:
> a) Socket error (=> result is set to negative number)
> b) timeout (=> result is set to TIMEOUT)
> c) some event arrived on socket (=> result is set to non-zero value, if
> caller registers for arrived events otherwise no value is set)
>
> Given the above conditions, the result can be zero only if there is an
> unregistered event which breaks the latch (*). In such case, current
> implementation evaluates the remaining sleep time. This calculation is
> making the situation worst, if time goes back.
>
> The time difference between cur_time (current time) and start_time (time
> when latch started) should always be a positive integer because cur_time is
> always greater than start_time under all normal conditions.
>
> delta_timeout = cur_time - start_time;
>
> The difference can be negative only if time shifts to past. So it is
> possible to detect if time shifted to past. When it is possible to detect
> can it be possible to correct? I think we can correct and prevent long
> sleeps due to time shifts.
>
> Currently I treat it as TIMEOUT, though conceptually it is not. The ideal
> solution would be to leave this decision to the caller of WaitLatch(). With
> my little knowledge of postgres code, I think TIMEOUT would be fine!
>
>
> (*) The above description is true only for timed wait. If latch is started
> with blocking wait (no timeout) then above logic is not applicable.
>
> On Sat, Jun 20, 2015 at 10:01 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Prakash Itnal <prakash074(at)gmail(dot)com> writes:
>> > Sorry for the late response. The current patch only fixes the scenario-1
>> > listed below. It will not address the scenario-2. Also we need a fix in
>> > unix_latch.c where the remaining sleep time is evaluated, if latch is
>> woken
>> > by other events (or result=0). Here to it is possible the latch might
>> go in
>> > long sleep if time shifts to past time.
>>
>> Forcing WL_TIMEOUT if the clock goes backwards seems like quite a bad
>> idea to me. That seems like a great way to make a bad situation worse,
>> ie it induces failures where there were none before.
>>
>> regards, tom lane
>>
>
>
>
> --
> Cheers,
> Prakash
>

--
Cheers,
Prakash

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-06-22 17:37:41 RFC: replace pg_stat_activity.waiting with something more descriptive
Previous Message Tomas Vondra 2015-06-22 15:24:02 Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H