Re: Tracking wait event for latches

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Tracking wait event for latches
Date: 2016-09-29 16:48:37
Message-ID: CA+TgmoYoT-reYMTN5kau_mK+UY2dB9O6hrKboGEizQ93A1Cm8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 28, 2016 at 9:40 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Wed, Sep 28, 2016 at 9:45 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Sep 28, 2016 at 8:38 AM, Michael Paquier
>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>> So should I change back the patch to have only one argument for the
>>> eventId, and guess classId from it?
>>
>> Why would you need to guess?
>
> Incorrect wording from me perhaps? i just meant that processing needs
> to know what is the classId coming for a specific eventId.
>
>> But, yes, I think one argument is much preferable.
>
> OK. Here is the wanted patch. I have reduced the routines of WaitLatch
> & friends to use only one argument, and added what is the classId
> associated with a given eventId in an array of multiple fields, giving
> something like that:
> + const struct wait_event_entry WaitEventEntries[] = {
> + /* Activity */
> + {WAIT_ACTIVITY, "ArchiverMain"},
> [...]
>
> I have cleaned up as well the inclusions of pgstat.h that I added
> previously. Patch is attached.

It seems to me that you haven't tested this patch very carefully,
because as far as I can see it breaks wait event reporting for both
heavyweight locks and buffer pins - or in other words two out of the
three existing cases covered by the mechanism you are patching.

The heavyweight lock portion is broken because WaitOnLock() reports a
Lock wait before calling ProcSleep(), which now clobbers it. Instead
of reporting that we're waiting for Lock/relation, a LOCK TABLE
statement that hangs now reports IPC/ProcSleep. Similarly, a conflict
over a tuple lock is now also reported as IPC/ProcSleep, and ditto for
any other kind of lock, which were all reported separately before.
Obviously, that's no good. I think it would be just fine to push down
setting the wait event into ProcSleep(), doing it when we actually
WaitLatch() or ResolveRecoveryConflictWithLock(), but we ought to
report that we're waiting for the heavyweight lock, not ProcSleep().

As for buffer pins, note that LockBufferForCleanup() calls
ProcWaitForSignal(), which now overwrites the wait event set in by its
caller. I think we could just make ProcWaitForSignal() take a wait
event as an argument; then LockBufferForCleanup() could pass an
appropriate constant and other callers of ProcWaitForSignal() could
pass their own wait events.

The way that we're constructing the wait event ID that ultimately gets
advertised in pg_stat_activity is a bit silly in this patch. We start
with an event ID (which is an integer) and we then do an array lookup
(via GetWaitEventClass) to get a second integer which is then XOR'd
back into the first integer (via pgstat_report_wait_start) before
storing it in the PGPROC. New idea: let's change
pgstat_report_wait_start() to take a single integer argument which it
stores without interpretation. Let's separate the construction of the
wait event into a separate macro, say make_wait_event(). Then
existing callers of pgstat_report_wait_start(a, b) will instead do
pgstat_report_wait_start(make_wait_event(a, b)), but callers that want
to construct the wait event and push it through a few levels of the
call stack before advertising it only need to pass one integer. If
done correctly, this should be cleaner and faster than what you've got
right now.

I am not a huge fan of the "WE_" prefix you've used. It's not
terrible, but "we" doesn't obviously stand for "wait event" and it's
also a word itself. I think a little more verbosity would add
clarity. Maybe we could go with WAIT_EVENT_ instead.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-09-29 16:56:21 Re: Tuplesort merge pre-reading
Previous Message Jeff Janes 2016-09-29 16:31:48 Re: pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)