Re: Tracking wait event for latches

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Tracking wait event for latches
Date: 2016-09-21 21:49:28
Message-ID: CAEepm=08fZ_rmedeZcRrNSLgUaYGJxkNDJ52D3VEqN7HfToawQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 22, 2016 at 1:23 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Sep 20, 2016 at 7:13 PM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> This paragraph seems a bit confused. I suggest something more like
>> this: "The server process is waiting for one or more sockets, a timer
>> or an interprocess latch. The wait happens in a WaitEventSet,
>> <productname>PostgreSQL</>'s portable IO multiplexing abstraction."
>
> I'm worried we're exposing an awful lot of internal detail here.

Ok, maybe it's not a good idea to mention WaitEventSet in the manual.
I was trying to keep the same basic structure of text as Michael
proposed, but fix the bit that implied that waiting happens "in a
latch", even when no latch is involved. Perhaps only the first
sentence is necessary. This will all become much clearer if there is
a follow-up patch that shows strings like "Socket", "Socket|Latch",
"Latch" instead of a general catch-all wait_event_type of "EventSet",
as discussed.

> Moreover, it's pretty confusing that we have this general concept of
> wait events in pg_stat_activity, and then here the specific type of
> wait event we're waiting for is the ... wait event kind. Uh, what?

Yeah, that is confusing. It comes about because of the coincidence
that pg_stat_activity finished up with a wait_event column, and our IO
multiplexing abstraction finished up with the name WaitEventSet.
<stuck-record-mode>We could instead call these new things "wait
points", because, well, erm, they name points in the code at which we
wait. They don't name events (they just happen to use the
WaitEventSet mechanism, which itself does involve events: but those
events are things like "hey, this socket is now ready for
writing").</>

> I have to admit that I like the individual event names quite a bit,
> and I think the detail will be useful to users. But I wonder if
> there's a better way to describe the class of events that we're
> talking about that's not so dependent on internal data structures.
> Maybe we could divide these waits into a couple of categories - e.g.
> "Socket", "Timeout", "Process" - and then divide these detailed wait
> events among those classes.

Well we already discussed a couple of different ways to get "Socket",
"Latch", "Socket|Latch", ... or something like that into the
wait_event_type column or new columns. Wouldn't that be better, and
automatically fall out of the code rather than needing human curated
categories? Although Michael suggested that that should be done as a
separate patch on top of the basic feature.

> The "SecureRead" and "SecureWrite" wait events are going to be
> confusing, because the connection isn't necessarily secure. I think
> those should be called "ClientRead" and "ClientWrite".
> Comprehensibility is more important than absolute consistency with the
> C function names.

Devil's advocate mode: Then why not improve those function names?
Keeping the wait point names systematically in sync with the actual
code makes things super simple and avoids a whole decision process and
discussion to create new user-friendly obfuscation every time anyone
introduces a new wait point. This is fundamentally an introspection
mechanism allowing expert users to shine a light on the engine and see
what's going on inside it, so I don't see what's wrong with being
straight up about what is actually going on and using the names for
parts of our code that we already have. If that leads us to improve
some function names I'm not sure why that's a bad thing.

Obviously this gets complicated by the existence of static functions
whose names are ambiguous and lack context, and multiple wait points
in a single function. Previously I've suggested a hierarchical
arrangement for these names which might help with that. Imagine names
like: executor.Hash.<function> (reported by a background worker
executing a hypothetical parallel hash join),
executor.Hash.<function>.<something> to disambiguate multiple wait
points in one function, walsender.<function> etc. That way we could
have a tidy curated meaningful naming scheme based on modules, but a
no-brainer systematic way to name the most numerous leaf bits of that
hierarchy. Just an idea...

> Another thing to think about is that there's no way to actually see
> wait event information for a number of the processes which this patch
> instruments, because they don't appear in pg_stat_activity.

Good point. Perhaps there could be another extended view, or system
process view, or some other mechanism. That could be material for a
separate patch?

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2016-09-21 22:27:31 Re: Hash Indexes
Previous Message Tomas Vondra 2016-09-21 21:46:35 Re: Better tracking of free space during SP-GiST index build