Re: stats.sql might fail due to shared buffers also used by parallel tests

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
Subject: Re: stats.sql might fail due to shared buffers also used by parallel tests
Date: 2025-07-20 08:00:01
Message-ID: e05868e2-19b2-4cf1-8299-6ac406035eee@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Kuroda-san,

Thank you for your attention to this!

15.07.2025 10:33, Hayato Kuroda (Fujitsu) wrote:
> GetSystemTimePreciseAsFileTime() returns FILETIME structure, which represents the
> time UTC with 100-nanosecod intervals [1]. The stack overflow seemed to refer it.
> However, the document for GetSystemTimePreciseAsFileTime() says that the
> resolution is < 1 us [2]. Also, MS doc [3] does not say that
> GetSystemTimePreciseAsFileTime() returns value monotonically.
> Another API QueryPerformanceCounter() seems to have the monotony.
>
> A bit old document [4] also raised the possibility:
>
> ```
> Consecutive calls may return the same result. The call time is less than the
> smallest increment of the system time. The granularity is in the sub-microsecond
> regime. The function may be used for time measurements but some care has to be
> taken: Time differences may be ZERO.
> ```
>
> Also, what if the the system clock is modified during the test via NTP?

Yeah, I made a simple test for GetSystemTimePreciseAsFileTime() and
confirmed that in my VM it provides sub-microsecond precision. Regarding
NTP, I think the second failure of this ilk [1] makes this cause close to
impossible. (Can't wait for the third one to gather more information.)

>>> 2) pg_stat_reset_subscription_stats(oid) function did not reset the stats.
>>> We have a shared hash 'pgStatLocal.shared_hash'. If the entry
>>> reference (for the subscription) is not found while executing
>>> 'pg_stat_reset_subscription_stats(oid)'. It may not be able to reset
>>> the stats. Maybe somehow this shared hash is getting dropped..
>>> Also, it could be failing due to the same reason as Alexander has
>> I don't think 2) is relevant here, because shared buffers shouldn't affect
>> subscription's statistics.
> To confirm; we do not consider the possibility that pgstat_get_entry_ref() returns
> NULL right?

I've held a simple experiment with a modification like this:
@@ -1078,6 +1078,7 @@ pgstat_reset_entry(PgStat_Kind kind, Oid dboid, uint64 objid, TimestampTz ts)
     Assert(!pgstat_get_kind_info(kind)->fixed_amount);

     entry_ref = pgstat_get_entry_ref(kind, dboid, objid, false, NULL);
+if (rand() % 3 == 0) entry_ref = NULL;
     if (!entry_ref || entry_ref->shared_entry->dropped)

and got several failures like:
--- .../postgresql/src/test/regress/expected/subscription.out 2025-04-25 10:27:32.851554400 -0700
+++ .../postgresql/build/testrun/regress/regress/results/subscription.out 2025-07-20 00:05:05.667903300 -0700
@@ -56,7 +56,7 @@
 SELECT subname, stats_reset IS NULL stats_reset_is_null FROM pg_stat_subscription_stats WHERE subname = 'regress_testsub';
      subname     | stats_reset_is_null
 -----------------+---------------------
- regress_testsub | f
+ regress_testsub | t
 (1 row)

 -- Reset the stats again and check if the new reset_stats is updated.
@@ -68,11 +68,9 @@
 (1 row)

 SELECT :'prev_stats_reset' < stats_reset FROM pg_stat_subscription_stats WHERE subname = 'regress_testsub';
- ?column?
-----------
- t
-(1 row)
-
+ERROR:  syntax error at or near ":"
+LINE 1: SELECT :'prev_stats_reset' < stats_reset FROM pg_stat_subscr...
+

--- .../postgresql/src/test/regress/expected/stats.out    2025-04-25 10:27:36.930322500 -0700
+++ .../postgresql/build/testrun/regress/regress/results/stats.out 2025-07-20 00:05:19.579864900 -0700
@@ -1720,7 +1720,7 @@
 SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset;
  ?column?
 ----------
- t
+ f
 (1 row)
...

Thus, if there is some issue with pgstat_get_entry_ref(), then it should
be specific to subscriptions and come out in that place only (given the
information we have now).

So I still suspect some Windows/concrete animal's peculiarity.

Nagata-san, could you please share the configuration of hamerkop? If it's
running inside VM, what virtualization software is used?

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamerkop&dt=2025-07-09%2011%3A02%3A23

Best regards.
Alexander

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael J. Baars 2025-07-20 08:13:50 Re: Upgrade from Fedora 40 to Fedora 42, or from PostgreSQL 16.3 to PostgreSQL 16.9
Previous Message Nikhil Kumar Veldanda 2025-07-20 06:56:23 Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem)