Re: 64-bit wait_event and introduction of 32-bit wait_event_arg

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: 64-bit wait_event and introduction of 32-bit wait_event_arg
Date: 2026-02-12 12:42:23
Message-ID: CAKZiRmxw1KwEPJZk8equXFyFweSt_X9hH59RdSAzpNROGEKG=w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 14, 2026 at 9:56 AM Jakub Wartak
<jakub(dot)wartak(at)enterprisedb(dot)com> wrote:
>
> On Wed, Jan 14, 2026 at 9:38 AM Bertrand Drouvot
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> > On Fri, Jan 09, 2026 at 11:34:09AM +0100, Jakub Wartak wrote:
> > > On Tue, Dec 9, 2025 at 10:11 AM Jakub Wartak
> > > <jakub(dot)wartak(at)enterprisedb(dot)com> wrote:
> > > >
> > > > Hi Heikki, thanks for having a look!
> > > >
> > > > On Mon, Dec 8, 2025 at 11:12 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> > > > >
> > > > > On 08/12/2025 11:54, Jakub Wartak wrote:
> > > > > > While thinking about cons, the only cons that I could think of is that
> > > > > > when we would be exposing something as 32-bits , then if the following
> > > > > > major release changes some internal structure/data type to be a bit
> > > > > > more heavy, it couldn't be exposed anymore like that (think of e.g.
> > > > > > 64-bit OIDs?)
> > > > > >
> > > > > > Any help, opinions, ideas and code/co-authors are more than welcome.
> > > >
> > > > > Expanding it to 64 bit seems fine as far as performance is concerned. I
> > > > > think the difficult and laborious part is to design the facilities to
> > > > > make use of it.
> > > >
> > > > Right, I'm very interested in hearing what could be added there/what
> > > > people want (bonus points if that is causing some performance issues
> > > > today and we do not have the area covered and exposing that would fit
> > > > in 32-bits ;) )
> > > >
> > >
> > > OK, so v3 is attached. Changes in v3:
> >
> > Thanks for the new version!
> >
> > It looks like that it needs a rebase. Also, FWIW, a quick scan shows a few
> > numbers of "XXX" and elog calls commented out (that are probably used during
> > your own debugging?).
>
> Yes, indeed, that's intentional right now - it's more like a draft
> rather than something that should be polished.
>
> To be honest I would like to avoid sinking more time on it, if the
> sole idea gets shot down or there is opposition due e.g. to concerns
> of exposing 32-bit relfilenodes that way (see that 56-bit relfilenode
> idea).

Goodafter gentlemen,

I was considering marking this as Rejected/RwF and giving up due
RelFilesNodes could becoming > 32-bits which kinda goes against the
the main intention of this patch (showing involved relations involved
in some complex LWLock/ Multixact performance scenarios).

In offline discussions with Andres and Robert I've learned that:
1. there's still room that RelFileNodes could become 56-bits one day
2. introducing another uint64 just for wait_events_arg is a no-go zone
due to performance concerns.
3. exposing something like "relfilenode % (2^32)" is seem as hack and could
cause issues (problems with interpretation/conflicts in future when
RelFileNode would be bigger)

Anyway, today this WIP/PoC patchset gives:

postgres=# select type, substring(name, 1, 20) wait,
substring(waiteventarg_description,1,43) as desc from pg_get_wait_events()
where waiteventarg_description != '';
type | wait | desc
---------+----------------------+---------------------------------------------
Buffer | BufferCleanup | Buffer# or UINT32_MAX for local(temporary)..
Buffer | BufferExclusive | Buffer# or UINT32_MAX for local(temporary)..
Buffer | BufferShared | Buffer# or UINT32_MAX for local(temporary)..
Buffer | BufferShareExclusive | Buffer# or UINT32_MAX for local(temporary)..
IO | SlruFlushSync | SlruType: unknown(0), notify(1), clog(2), ..
IO | SlruRead | SlruType: unknown(0), notify(1), clog(2), ..
IO | SlruSync | SlruType: unknown(0), notify(1), clog(2), ..
IO | SlruWrite | SlruType: unknown(0), notify(1), clog(2), ..
IPC | BufferIo | Buffer# or UINT32_MAX for local(temporary)
IPC | RecoveryConflictTabl | tablespace Oid causing conflict.
IPC | SyncRep | PID of the slowest walsender.
Timeout | PgSleep | how many seconds to sleep for.
Timeout | SpinDelay | Number of spinlock delays.

Summary of changes since previous version:

- Removed all refilnodeid references including
ProcSleep()->WaitLatch(..PG_WAIT_LOCK | locktag_field2 );
as we cannot take locktag_type_field2 (which maps to reloid, set by
SET_LOCKTAG_RELATION)

- In pgstat_report_wait_end() change volatile direct set to zero with
more proper: pg_atomic_write_u64(..,0);

- separated patch for SyncRepWaitForLSN() as I have plenty of performance
concerns there (with abnormally high max_wal_senders). I could reduce those
spinlocks happen not more often than every N iterations as today
there is a full scan
under spinlocks every time the latch is reset, but how often to do this
scan then?

- added exposing Buffer# (one can lookup relation via pg_buffercache),
idea by Andres, it seems to work (simulated with fetching from cursor):

pid | type | wait_event | wait_event_arg | state | query
--------+--------+--------------+----------------+--------+----------------
250556 | Buffer BufferCleanup | 225 | active | VACUUM (FREEZE)..

postgres=# select
pg_filenode_relation(0, relfilenode)::regclass,
pinning_backends
from pg_buffercache where bufferid = 225;

pg_filenode_relation | pinning_backends
----------------------+-----------------
pin_test | 2

- added exposing Timeout/SpinDelay, not sure if that would be helpful

What's left:
- Earlier Heikki raised the question "Wait events can be defined in extensions;
how does an extension plug into this facility?" - that's still unanswered.
I think they could just OR 32-bit value themselves, but maybe we could
just provide a way to plug into pg_get_wait_events().waiteventarg_description?
- docs
- of course it could be extended with some reporting if one finds further
ideas

-J.

Attachment Content-Type Size
v4-0006-wait_event_arg-expose-buffer-for-Buffer-type-wait.patch text/x-patch 3.7 KB
v4-0002-wait_event_arg-expose-slowest-standby-PID-for-IPC.patch text/x-patch 2.8 KB
v4-0004-Expose-meaning-of-new-per-wait-wait_event_arg-thr.patch text/x-patch 11.3 KB
v4-0005-wait_event_arg-report-number-of-spinlock-delays-f.patch text/x-patch 2.1 KB
v4-0003-wait_event_arg-implement-SLRU-type-reporting-for-.patch text/x-patch 11.1 KB
v4-0001-Convert-wait_event_info-to-64-t-bits-expose-lower.patch text/x-patch 78.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matheus Alcantara 2026-02-12 12:43:32 Re: Add CREATE SCHEMA ... LIKE support
Previous Message Dean Rasheed 2026-02-12 12:23:17 Re: Allow ON CONFLICT DO UPDATE to return EXCLUDED values