Re: when the startup process doesn't (logging startup delays)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Stephen Frost <sfrost(at)snowman(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: when the startup process doesn't (logging startup delays)
Date: 2021-09-30 18:41:42
Message-ID: CA+TgmoZTv==CKVZ+dZ69E-YxSpi2bFX-HL6+bBsKWvi4+NO_bw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 29, 2021 at 5:12 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I didn't claim there are any other places that could use the feature
> *today*. But once we've got one, it seems like there could be more
> tomorrow. In any case, I dislike keeping timeout state data outside
> timeout.c, because it's so likely to get out-of-sync that way.

Well, I had a quick go at implementing this (attached).

It seems to do a satisfactory job preventing drift over time, but it
doesn't behave nicely if you set the system clock backward. With a bit
of extra debugging output:

2021-09-30 14:23:50.291 EDT [2279] LOG: scheduling wakeup in 2 secs,
998727 usecs
2021-09-30 14:23:53.291 EDT [2279] LOG: scheduling wakeup in 2 secs,
998730 usecs
2021-09-30 14:23:56.291 EDT [2279] LOG: scheduling wakeup in 2 secs,
998728 usecs
2021-09-30 14:20:01.154 EDT [2279] LOG: scheduling wakeup in 238
secs, 135532 usecs
2021-09-30 14:23:59.294 EDT [2279] LOG: scheduling wakeup in 2 secs, 995922 use

The issue here is that fin_time is really the driving force behind
everything timeout.c does. In particular, it determines the order of
active_timeouts[]. And that's not really correct either for
enable_timeout_after(), or for the new function I added in this draft
patch, enable_timeout_every(). When I say I want my handler to be
fired in 3 s, I don't mean that I want it to be fired when the system
time is 3 seconds greater than it is right now. I mean I want it to be
fired in 3 actual seconds, regardless of what dumb thing the system
clock may choose to do. I don't really think that precise behavior can
be implemented, but ideally if a timeout that was supposed to happen
after 3 s is now scheduled for a time that is more than 3 seconds
beyond the current value of the system clock, we'd move the firing
time backwards to 3 seconds beyond the current system clock. That way,
if you set the system time backward by 4 minutes, you might see a few
seconds of delay before the next firing, but you wouldn't go into the
tank for 4 minutes.

I don't see an obvious way of making timeout.c behave like that
without major surgery, though. If nobody else does either, then we
could (1) stick with something closer to Nitin's current patch, which
tries to handle this concern outside of timeout.c, (2) adopt something
like the attached 0001 and leave the question of improved behavior in
case of backwards system clock adjustments for another day, or (3)
undertake to rewrite timeout.c as a precondition of being able to log
messages about why startup is slow. I'm not a huge fan of (3) but I'm
open to suggestions.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
0002-Quick-testing-hack.patch application/octet-stream 2.2 KB
0001-Add-enable_timeout_every-to-fire-the-same-timeout-re.patch application/octet-stream 5.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-09-30 18:53:56 Re: prevent immature WAL streaming
Previous Message Andres Freund 2021-09-30 18:36:41 002_types.pl fails on some timezones on windows