| From: | Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com> |
|---|---|
| To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: 64-bit wait_event and introduction of 32-bit wait_event_arg |
| Date: | 2025-12-09 09:11:50 |
| Message-ID: | CAKZiRmxeci4QypgYrZbjWqqGZN1+6Ozz+53jPQ4vNP8gGh4aQg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Heikki, thanks for having a look!
On Mon, Dec 8, 2025 at 11:12 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> On 08/12/2025 11:54, Jakub Wartak wrote:
> > While thinking about cons, the only cons that I could think of is that
> > when we would be exposing something as 32-bits , then if the following
> > major release changes some internal structure/data type to be a bit
> > more heavy, it couldn't be exposed anymore like that (think of e.g.
> > 64-bit OIDs?)
> >
> > Any help, opinions, ideas and code/co-authors are more than welcome.
> Expanding it to 64 bit seems fine as far as performance is concerned. I
> think the difficult and laborious part is to design the facilities to
> make use of it.
Right, I'm very interested in hearing what could be added there/what
people want (bonus points if that is causing some performance issues
today and we do not have the area covered and exposing that would fit
in 32-bits ;) )
> For example, if you encode an table OID in it, how do
> you interpret that when you're looking at pg_stat_activity? A new
> pg_explain_wait_event(bigint waitevent) that returns a text
> representation of the event perhaps?
Well I was thinking initially just about leaving it as that (bigint),
and the interpretation would have to be provided by the operator
himself (based on docs) - not yet part of patch, because I still don't
know if the idea is worth developing further. Technically the
wait_event_arg value sometimes is going to be some OID, sometimes pid
(like in SyncRep case), most often probably it could be reason_code
(of the wait), sometimes maybe even some hash of something to make it
fit? So yeah I think we could. I like the idea of having
pg_explain_wait_event_argument(bigint)::text built-in that could add
some additional hint to what the argument really shows without looking
at the docs. Question what it should return, simple ::text like
'reason'/'pid'/'OID' or something more descriptive in English and
wouldn't English only output be a problem for translators?
The alternative would be just to have a table inside docs (for a
start?) to explain the meaning. In practice you would hunt for
specific wait_event or have some big CASE WHEN/ELSE IF big SQL query
to interpret the values properly.
> Wait events can be defined in extensions; how does an extension plug into this facility?
I have not given extensions a lot of thought or coverage yet, but the
answer is probably like: well, they don't seem to plug heavily into
this, but I think one could in extension just use
WaitEventExtensionNew() / pgstat_report_wait_start() as usual and
later logically OR some 32-bit number, however the interpretation of
the wait_event_arg would have to be provided by the extension itself
(via docs) I guess. Would that approach be acceptable?, or Were You
having some other idea? Maybe with Your idea of having
pg_explain_wait_event_argument(), then we would have to alter to
WaitEventExtensionNew(const char *wait_event_name) and add something
like 'const char *wait_event_arg_description' there?
> Inevitably, the extra 32 bits won't be enough to expose everything that
> you might want to expose. Should we already think about what to do then?
Well I wanted to stick to exposing only stuff that will _always_ fit
32-bits. If additional/more detailed instrumentation would be
necessary then separate monitoring/observability/variables/subsystem
probably should be built for that specific use case. So if that
information can become over 32-bit, it should not be encoded into
wait_event_arg, just to avoid debating performance regressions for any
other additional wait-event infrastructure. I simply do not want to
open a can of worms: see Bertrand tried that in [1], but I don't want
this $thread to follow that route where Andres and Robert expressed
their concerns earlier. E.g. one of the key questions is that I'm
somehow lost if we would like to continue the earlier 56-bit [2] /
64-bit OID/RelFileNode attempt(s). If the project wants to continue
with that, then probably we couldn't express ::relation id as 32-bit
wait_event_arg or maybe I am missing something. (ofc, we could hash
potential 64-bit OID back into 32-bit OID one day, but it sounds like
a hack, doesn't it?)
> For lock waits, for example, should we have another array in shared
> memory with more details, and just store an offset into that array in
> the extra wait event bits, for example? (we already have pg_locks, but
> let's imagine we didn't. How would you design it in a green field scenario?
If we didn't have pg_locks, I would probably stick with encoding the
mode, maybe mode|granted|fastpath (assuming OIDs are no-go).
Some brainstorming and other crazy(?) ideas how we could expose some
intrinsic PG behavior:
- writing while reading (AKA setting hint bits) - could be exposed as
reason_code for write-like wait events? (e.g. for IO/WALWrite we could
encode reason_code?)
- same as above (hint bits), but for CLOG/SLRU but also for others?
Maybe we could expose what SLRU exactly we are reading/writing
IO/SLRU_READ|WRITE waits and encodes further some "reason" there too?
- still for IO/WALWrite, we could also add another reason_code bit
meaning: are we writing full FPI or not? (that would it make
wait_event_arg for IO/WALWrite a bitmap: e.g. writing_FPI |
writing_hintbits)
-J.
[1] - https://www.postgresql.org/message-id/lt6n664ijbmfftnuv3bgvt47q7kjz4tflu4kg3ingv6njjtvld%40kesknxnidemo
[2] - https://www.postgresql.org/message-id/flat/CA%2BTgmobM5FN5x0u3tSpoNvk_TZPFCdbcHxsXCoY1ytn1dXROvg%40mail.gmail.com#1070c79256f2330ec52f063cdbe2add0
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jakub Wartak | 2025-12-09 09:14:19 | Re: 64-bit wait_event and introduction of 32-bit wait_event_arg |
| Previous Message | Thomas Munro | 2025-12-09 09:10:39 | Re: Safer hash table initialization macro |