| From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
|---|---|
| To: | Lukas Fittl <lukas(at)fittl(dot)com> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Hannu Krosing <hannuk(at)google(dot)com> |
| Subject: | Re: pg_test_timing: fix unit typo and widen diff type |
| Date: | 2026-04-08 04:06:07 |
| Message-ID: | CDC22691-F63E-4FD3-865A-2982E3B56DCA@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On Apr 2, 2026, at 15:28, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>
>
>
>> On Apr 2, 2026, at 12:17, Lukas Fittl <lukas(at)fittl(dot)com> wrote:
>>
>> Hi Chao,
>>
>> On Wed, Apr 1, 2026 at 8:10 PM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>>>
>>> Hi,
>>>
>>> This morning, as part of my usual routine, I synced the master branch and read through the recent commits. While reading 82c0cb4e672, I noticed a mistake in an error message. The relevant code is like:
>>> ```
>>> diff = INSTR_TIME_GET_NANOSEC(diff_time);
>>>
>>> fprintf(stderr, _("Time warp: %d ms\n"), diff);
>>> ```
>>>
>>> Here, “diff" is in nanoseconds, but the error message prints ms as the unit, which is incorrect.
>>
>> Good catch!
>>
>> It looks like the use of nanoseconds for "diff" got introduced last
>> year in 0b096e379e6f9bd49 (as you note later in the email, today's
>> commit didn't actually change that part), CCing Tom and Hannu as
>> authors of that earlier change.
>>
>> That said, its a bit odd that we were using INSTR_TIME_GET_MICROSEC
>> there before that earlier commit, but called it "ms" (i.e.
>> milliseconds) in the error message.
>>
>>>
>>> To fix that, I think there are two possible options:
>>>
>>> 1. Use INSTR_TIME_GET_MILLISEC to get “diff"
>>> 2. Change “ms" to “ns" in the error message
>>>
>>> After reading through the whole file, I think option 2 is the right fix. While doing that, I also noticed another issue.
>>>
>>> “diff" is currently defined as int32. Although one might think that is enough for a time delta, I believe it should be int64 for two reasons:
>>>
>>> * INSTR_TIME_GET_NANOSEC() explicitly returns int64:
>>> ```
>>> #define INSTR_TIME_GET_NANOSEC(t) \
>>> ((int64) (t).ticks)
>>> ```
>>>
>>> * The current code has a sanity check for backward clock drift:
>>> ```
>>> /* Did time go backwards? */
>>> if (unlikely(diff < 0))
>>> {
>>> fprintf(stderr, _("Detected clock going backwards in time.\n"));
>>> fprintf(stderr, _("Time warp: %d ms\n"), diff);
>>> exit(1);
>>> }
>>> ```
>>> Clock jumping forward is also possible, and a forward jump of about 2.14 seconds would overflow int32 when expressed in nanoseconds, making the value appear negative. In that case, the code could report a “backwards” clock jump when the actual jump was forwards, which would be misleading.
>>
>> I agree it doesn't seem sound to use an int32 for storing the result
>> of INSTR_TIME_GET_NANOSEC. It looks like we may also need to adjust
>> the call to pg_leftmost_one_pos32 though if we actually accept that
>> large a "diff" value, as in your patch.
>
> You are right. Changed to pg_leftmost_one_pos64 in v2.
>
>>
>> Maybe we should error out if the diff is larger than an int32, noting
>> a positive time drift?
>
> I agree we should warn/fail upon clock forwards drift. But I doubt int32 is too big (~2.14 seconds), I consider 1 second could be a too big threshold. Let’s wait for more voices on this.
>
>>
>> Independently of that, its worth noting we could easily emit the diff
>> in a larger unit (micro or milliseconds) for easier interpretation, by
>> just calling INSTR_TIME_GET_MICROSEC / INSTR_TIME_GET_MILLISEC on the
>> "diff_time" again.
>>
>
> Given the error should rarely happen, I personally feel that might not be super helpful. Also, if the drift is just beyond the threshold, bumping to microsecond or millisecond might print just 0.
>
> PFA v2 - updated 0002 for pg_leftmost_one_pos64.
>
> Best regards,
> --
> Chao Li (Evan)
> HighGo Software Co., Ltd.
> https://www.highgo.com/
>
>
>
>
> <v2-0001-pg_test_timing-fix-unit-in-backward-clock-warning.patch><v2-0002-pg_test_timing-use-int64-for-largest-observed-tim.patch>
PFA v3: Fixed a CI failure.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
| Attachment | Content-Type | Size |
|---|---|---|
| v3-0001-pg_test_timing-fix-unit-in-backward-clock-warning.patch | application/octet-stream | 1.3 KB |
| v3-0002-pg_test_timing-use-int64-for-largest-observed-tim.patch | application/octet-stream | 2.4 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andres Freund | 2026-04-08 04:09:53 | Re: Stack-based tracking of per-node WAL/buffer usage |
| Previous Message | shveta malik | 2026-04-08 03:59:59 | Re: doc: Improve wal_level and effective_wal_level GUC around logical replication |