Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

From: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm
Date: 2020-11-09 20:03:53
Message-ID: f36e95736ed8c57d61cfb054fd734194@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020-11-09 21:53, Tom Lane wrote:
> Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru> writes:
>> After fixing this issue I have noticed that it still dumps blocks
>> twice
>> at each timeout (here I set autoprewarm_interval to 15s):
>> ...
>> This happens because at timeout time we were using continue, but
>> actually we still have to wait the entire autoprewarm_interval after
>> successful dumping.
>
> I don't think your 0001 is correct. It would be okay if apw_dump_now()
> could be counted on to take negligible time, but we shouldn't assume
> that should we?
>

Yes, it seems so, if I understand you correctly. I had a doubt about
possibility of pg_ctl to exit earlier than a dumping process. Now I
added an explicit wait for dump file into test.

> I agree that the "continue" seems a bit bogus, because it's skipping
> the ResetLatch call at the bottom of the loop; it's not quite clear
> to me whether that's a good thing or not. But the general idea of
> the existing code seems to be to loop around and make a fresh
> calculation
> of how-long-to-wait, and that doesn't seem wrong.

I have left the last patch intact, since it resolves the 'double dump'
issue, but I agree with нщгк point about existing logic of the code,
although it is a bit broken. So I have to think more about how to fix it
in a better way.

> 0002 seems like a pretty clear bug fix, though I wonder if this is
> exactly
> what we want to do going forward. It seems like a very large fraction
> of
> the callers of TimestampDifference would like to have the value in
> msec,
> which means we're doing a whole lot of expensive and error-prone
> arithmetic to break down the difference to sec/usec and then put it
> back together again. Let's get rid of that by inventing, say
> TimestampDifferenceMilliseconds(...).

Yeah, I get into this problem after a bug in another extension —
pg_wait_sampling. I have attached 0002, which implements
TimestampDifferenceMilliseconds(), so 0003 just uses this new function
to solve the initial issues. If it looks good to you, then we can switch
all similar callers to it.

> BTW, I see another bug of a related ilk. Look what
> postgres_fdw/connection.c is doing:
>
> TimestampDifference(now, endtime, &secs, &microsecs);
>
> /* To protect against clock skew, limit sleep to one
> minute. */
> cur_timeout = Min(60000, secs * USECS_PER_SEC +
> microsecs);
>
> /* Sleep until there's something to do */
> wc = WaitLatchOrSocket(MyLatch,
> WL_LATCH_SET |
> WL_SOCKET_READABLE |
> WL_TIMEOUT |
> WL_EXIT_ON_PM_DEATH,
> PQsocket(conn),
> cur_timeout, PG_WAIT_EXTENSION);
>
> WaitLatchOrSocket's timeout is measured in msec not usec. I think the
> comment about "clock skew" is complete BS, and the Min() calculation
> was
> put in as a workaround by somebody observing that the sleep waited too
> long, but not understanding why.

I wonder how many troubles one can get with all these unit conversions.

Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

Attachment Content-Type Size
v2-0004-pg_prewarm-refactor-autoprewarm-waits.patch text/x-diff 891 bytes
v2-0003-pg_prewarm-fix-autoprewarm_interval-behaviour.patch text/x-diff 1.3 KB
v2-0002-Implement-TimestampDifferenceMilliseconds.patch text/x-diff 2.2 KB
v2-0001-pg_prewarm-add-tap-test-for-autoprewarm-feature.patch text/x-diff 2.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-11-09 20:25:16 Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm
Previous Message Bruce Momjian 2020-11-09 20:01:55 Re: public schema default ACL