Re:Re: pg_test_timing: fix unit typo and widen diff type

From: "wang(dot)xiao(dot)peng" <wxp_728(at)163(dot)com>
To: "Chao Li" <li(dot)evan(dot)chao(at)gmail(dot)com>, lukas(at)fittl(dot)com
Cc: "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Hannu Krosing" <hannuk(at)google(dot)com>
Subject: Re:Re: pg_test_timing: fix unit typo and widen diff type
Date: 2026-04-09 06:12:39
Message-ID: 7d4afbdb.5322.19d70df0de4.Coremail.wxp_728@163.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At 2026-04-08 12:06:07, "Chao Li" <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>
>
>> On Apr 2, 2026, at 15:28, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>>
>>
>>
>>> On Apr 2, 2026, at 12:17, Lukas Fittl <lukas(at)fittl(dot)com> wrote:
>>>
>>> Hi Chao,
>>>
>>> On Wed, Apr 1, 2026 at 8:10 PM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> This morning, as part of my usual routine, I synced the master branch and read through the recent commits. While reading 82c0cb4e672, I noticed a mistake in an error message. The relevant code is like:
>>>> ```
>>>> diff = INSTR_TIME_GET_NANOSEC(diff_time);
>>>>
>>>> fprintf(stderr, _("Time warp: %d ms\n"), diff);
>>>> ```
>>>>
>>>> Here, “diff" is in nanoseconds, but the error message prints ms as the unit, which is incorrect.
>>>
>>> Good catch!
>>>
>>> It looks like the use of nanoseconds for "diff" got introduced last
>>> year in 0b096e379e6f9bd49 (as you note later in the email, today's
>>> commit didn't actually change that part), CCing Tom and Hannu as
>>> authors of that earlier change.
>>>
>>> That said, its a bit odd that we were using INSTR_TIME_GET_MICROSEC
>>> there before that earlier commit, but called it "ms" (i.e.
>>> milliseconds) in the error message.
>>>
>>>>
>>>> To fix that, I think there are two possible options:
>>>>
>>>> 1. Use INSTR_TIME_GET_MILLISEC to get “diff"
>>>> 2. Change “ms" to “ns" in the error message
>>>>
>>>> After reading through the whole file, I think option 2 is the right fix. While doing that, I also noticed another issue.
>>>>
>>>> “diff" is currently defined as int32. Although one might think that is enough for a time delta, I believe it should be int64 for two reasons:
>>>>
>>>> * INSTR_TIME_GET_NANOSEC() explicitly returns int64:
>>>> ```
>>>> #define INSTR_TIME_GET_NANOSEC(t) \
>>>> ((int64) (t).ticks)
>>>> ```
>>>>
>>>> * The current code has a sanity check for backward clock drift:
>>>> ```
>>>> /* Did time go backwards? */
>>>> if (unlikely(diff < 0))
>>>> {
>>>> fprintf(stderr, _("Detected clock going backwards in time.\n"));
>>>> fprintf(stderr, _("Time warp: %d ms\n"), diff);
>>>> exit(1);
>>>> }
>>>> ```
>>>> Clock jumping forward is also possible, and a forward jump of about 2.14 seconds would overflow int32 when expressed in nanoseconds, making the value appear negative. In that case, the code could report a “backwards” clock jump when the actual jump was forwards, which would be misleading.
>>>
>>> I agree it doesn't seem sound to use an int32 for storing the result
>>> of INSTR_TIME_GET_NANOSEC. It looks like we may also need to adjust
>>> the call to pg_leftmost_one_pos32 though if we actually accept that
>>> large a "diff" value, as in your patch.
>>
>> You are right. Changed to pg_leftmost_one_pos64 in v2.
>>
>>>
>>> Maybe we should error out if the diff is larger than an int32, noting
>>> a positive time drift?
>>
>> I agree we should warn/fail upon clock forwards drift. But I doubt int32 is too big (~2.14 seconds), I consider 1 second could be a too big threshold. Let’s wait for more voices on this.
>>
>>>
>>> Independently of that, its worth noting we could easily emit the diff
>>> in a larger unit (micro or milliseconds) for easier interpretation, by
>>> just calling INSTR_TIME_GET_MICROSEC / INSTR_TIME_GET_MILLISEC on the
>>> "diff_time" again.
>>>
>>
>> Given the error should rarely happen, I personally feel that might not be super helpful. Also, if the drift is just beyond the threshold, bumping to microsecond or millisecond might print just 0.
>>
>> PFA v2 - updated 0002 for pg_leftmost_one_pos64.
>>
>> Best regards,
>> --
>> Chao Li (Evan)
>> HighGo Software Co., Ltd.
>> https://www.highgo.com/
>>
>>
>>
>>
>> <v2-0001-pg_test_timing-fix-unit-in-backward-clock-warning.patch><v2-0002-pg_test_timing-use-int64-for-largest-observed-tim.patch>
>
>PFA v3: Fixed a CI failure.
>
>Best regards,
>--
>Chao Li (Evan)
>HighGo Software Co., Ltd.
>https://www.highgo.com/
>
>
Hi Chao, Lukas,
Just finished reviewing the v3 patches - looks good to me. The pg_leftmost_one_pos64 fix is spot on, and using INT64_FORMAT for the error message is the right approach.
Nice catch on the unit mismatch and the int32 overflow issue. Both patches are ready to go from my perspective.
Cheers,
Xiaopeng Wang

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2026-04-09 06:14:23 Re: Add errdetail() with PID and UID about source of termination signal
Previous Message Alexander Korotkov 2026-04-09 06:00:06 Re: Bug: WAIT FOR LSN crashes with assertion failure inside PL/pgSQL DO blocks and procedures