| From: | Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com> |
|---|---|
| To: | Michael Paquier <michael(at)paquier(dot)xyz> |
| Cc: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: 64-bit wait_event and introduction of 32-bit wait_event_arg |
| Date: | 2026-03-02 08:39:22 |
| Message-ID: | CAKZiRmy9MKBivq0Wzm6E01H4SvLn-KhfjpKTTjvmmd4rTHXTzA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, Feb 16, 2026 at 12:34 PM Jakub Wartak
<jakub(dot)wartak(at)enterprisedb(dot)com> wrote:
>
> On Fri, Feb 13, 2026 at 5:46 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> >
> > On Thu, Feb 12, 2026 at 01:42:23PM +0100, Jakub Wartak wrote:
>
> Hi Michael, thanks for taking a look at this!
>
> > > What's left:
> > > - Earlier Heikki raised the question "Wait events can be defined in extensions;
> > > how does an extension plug into this facility?" - that's still unanswered.
> >
> > Reserving the full 8 bytes to the callers of WaitEventExtensionNew()
> > and WaitEventInjectionPointNew() would be an error, because we would
> > forever lock down the possibility for extensions to set at will the 4
> > extra bytes that become available when setting some extra data in
> > parallel of a wait event name. The results of these routines should
> > still be 4 bytes for the "static" part of the wait event names, not 8.
>
> When trying to understand this I found that the patch contained assumption
> error on my side that might have mislead You: I was blind to the fact that
> all custom wait events land in PG_WAIT_EXTENSION class, so we need in
> WaitEventCustomNew() to shift the resulting eventId from nextId++ to the
> proper place in 64-bit word (this leaves the 32-bits part (for
> wait_event_arg) initially as zero when returned by those routines)
>
> .. and then, well, the registration of the new custom wait event is something
> distinct from starting the waiting on it, right? I mean the extension
> could:
>
> // it will contain PG_WAIT_EXTENSION | shifted_eventId_from_nextId
> // + zero on all 32-bit lower bits (for wait_event_arg)
> uint64 custom_wait_event = WaitEventExtensionNew("custom");
> libpqsrv_connect(..,custom_wait_event);
> ..
> // and we can still add 32-bit argument there
> libpqsrv_connect(..,custom_wait_event | some_wait_event_arg);
>
> Because probably the code was misleading Could I ask You to
> re-evaluate the above
> concern based on the fixed code?
>
> > > I think they could just OR 32-bit value themselves, but maybe we could
> > > just provide a way to plug into pg_get_wait_events().waiteventarg_description?
> >
> > The value provided back to pg_stat_activity would be a 4-byte integer
> > under this design, whose interpretation is up to the client, I guess,
> > with a filter based on the wait event name found (likely a CASE/ELSE
> > to force casts back to a text value at the end in most cases?). That
> > may be annoying for client applications, though, but perhaps
> > acceptable as this provides extra information with a single atomic
> > write.
>
> Yes, today we just return the lower half of 64-bits as
> pg_stat_activity.wait_event_arg::integer. The usage could be be interpreted
> by the SQL query doing CASE/ELSE if necessary (or just by human when directly
> looking at the pg_get_wait_events().waiteventarg_description -- that is what
> is implemented today for the IO/Slru* wait event and their wait_event_arg),
> e.g.:
>
> postgres=# select type, substring(name, 1, 20) wait,
> waiteventarg_description as desc
> from pg_get_wait_events() where type = 'IO' and name like 'Slru%';
> type | wait |
> desc
> ------+---------------+----------------------------------------------------------------------------------------------------------------------------------
> IO | SlruFlushSync | SlruType: unknown(0), notify(1), clog(2),
> subtrans(3), committs(4), multixactoffset (5), multixactmembers(6),
> serialializable(7)
> IO | SlruRead | SlruType: unknown(0), notify(1), clog(2),
> subtrans(3), committs(4), multixactoffset (5), multixactmembers(6),
> serialializable(7)
> [..]
>
> so one of course could write something like to interpret it:
> SELECT *, CASE WHEN wait_event_arg=0 THEN 'unknown'
> WHEN wait_event_arg=1 THEN 'notify'
> [..]
> ELSE 'other'
> END as
> FROM pg_stat_activity WHERE wait_event LIKE 'Slru%';
>
> > At the end, the way these 4 extra bytes can be set by extensions is an
> > API problem for me, and I suspect that the correct way to extend
> > things, on top of forcing the use of 4 bytes for the ID of the fixed
> > event ID (perhaps just define a type here anyway?), would be to patch
> > the most popular APIs that extensions currently use to let them set
> > the value they want for the extra 4 bytes. The first choice that
> > comes into mind here is the family of WaitLatchOrSocket() APIs, that
> > could have an extra argument with a uint32 for the extra data. That's
> > a popular one among extension developers.
>
> Right now in the patchset, the WaitLatchOrSocket() and WaitLatch() take
> uint64 argument, I've done that so that we I could minimize the patchset
> footprint and then e.g. I do still:
> - rc = WaitLatch(MyLatch, ..., WAIT_EVENT_SYNC_REP);
> + rc = WaitLatch(MyLatch, ..., WAIT_EVENT_SYNC_REP | wait_event_arg_pid);
> (so uint64 that is OR-ed with CLASS and 32-bit lower e.g. pid here)
> so without many internal unnecessary changes as WaitLatch() is used heavily.
>
> BTW: note that all of the WAIT_EVENT_* are now 64-bits, so it allows to
> work that way.
>
> One problem with that approach is that with WaitLatchOrSocket()/WaitLatch()
> the extensions would have to be altered (the extension code can pass uint32
> silently - it will compile and won't show a good wait event in runtime).
>
> Sadly this won't be detected without -Wconversion (silent casting will
> uint32->uint64) and I don't have other idea how silent casting could
> be detected during extension compilation, as even
> `#pragma GCC diagnostic warning "-Wconversion"` won't be effective in
> 3rd party code using this.
>
> > By the way, patch 0001 includes a log file from pg_plan_advice with
> > some information I suspect you did not intend to send..
>
> Oops, fixed, thanks.
>
> New version attached, it also fixed the dblink bug where it was not showing
> proper wait events (missing uint32 to uint64 change).
>
> Probably one thing remaining is also the API about pg_get_wait_events(). To
> me it looks like:
> - the extensions do not have way to pass description (it's just static
> "Waiting for custom wait event \"%s\" defined by extension module")
> - therefore they also miss way to pass waiteventarg_description
>
> I think there are two ways there:
> a) either display static text for waiteventarg_description like:
> "consult extension documentation for details"
>
> b) or we could add some more extended way to initialize:
> WaitEventExtensionNew(
> const char *wait_event_name,
> const char *wait_event_desc,
> const char *wait_event_arg_desc);
>
> but I seriously doubt it would be used by anyone, but maybe I'm wrong.
v6 attached, rebased due to 412f78c66eedbe9.
-J.
| Attachment | Content-Type | Size |
|---|---|---|
| v6-0005-wait_event_arg-report-number-of-spinlock-delays-f.patch | text/x-patch | 2.1 KB |
| v6-0004-Expose-meaning-of-new-per-wait-wait_event_arg-thr.patch | text/x-patch | 11.3 KB |
| v6-0003-wait_event_arg-implement-SLRU-type-reporting-for-.patch | text/x-patch | 11.1 KB |
| v6-0002-wait_event_arg-expose-slowest-standby-PID-for-IPC.patch | text/x-patch | 2.8 KB |
| v6-0006-wait_event_arg-expose-buffer-for-Buffer-type-wait.patch | text/x-patch | 3.7 KB |
| v6-0001-Convert-wait_event_info-to-64-t-bits-expose-lower.patch | text/x-patch | 67.7 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Alexandre Felipe | 2026-03-02 09:00:34 | Re: index prefetching |
| Previous Message | Joel Jacobson | 2026-03-02 08:39:15 | Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq |