Re: Tracking wait event for latches

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Tracking wait event for latches
Date: 2016-09-30 04:47:48
Message-ID: CAB7nPqQCoOnMPptwyjYAG60cJgBfB=Eu4CBy=ATTZwdYgvrC2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Fri, Sep 30, 2016 at 1:48 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> It seems to me that you haven't tested this patch very carefully,
> because as far as I can see it breaks wait event reporting for both
> heavyweight locks and buffer pins - or in other words two out of the
> three existing cases covered by the mechanism you are patching.

Oops. The WaitLatch calls overlap the other things if another event is reported.

> The heavyweight lock portion is broken because WaitOnLock() reports a
> Lock wait before calling ProcSleep(), which now clobbers it. Instead
> of reporting that we're waiting for Lock/relation, a LOCK TABLE
> statement that hangs now reports IPC/ProcSleep. Similarly, a conflict
> over a tuple lock is now also reported as IPC/ProcSleep, and ditto for
> any other kind of lock, which were all reported separately before.
> Obviously, that's no good. I think it would be just fine to push down
> setting the wait event into ProcSleep(), doing it when we actually
> WaitLatch() or ResolveRecoveryConflictWithLock(), but we ought to
> report that we're waiting for the heavyweight lock, not ProcSleep().

Somewhat similar problem with ResolveRecoveryConflictWithBufferPin(),
per this reasoning what we should wait for here is a buffer pin and
not a IPC/WaitForSignal.

> As for buffer pins, note that LockBufferForCleanup() calls
> ProcWaitForSignal(), which now overwrites the wait event set in by its
> caller. I think we could just make ProcWaitForSignal() take a wait
> event as an argument; then LockBufferForCleanup() could pass an
> appropriate constant and other callers of ProcWaitForSignal() could
> pass their own wait events.

Agreed. So changed the patch this way.

> The way that we're constructing the wait event ID that ultimately gets
> advertised in pg_stat_activity is a bit silly in this patch. We start
> with an event ID (which is an integer) and we then do an array lookup
> (via GetWaitEventClass) to get a second integer which is then XOR'd
> back into the first integer (via pgstat_report_wait_start) before
> storing it in the PGPROC. New idea: let's change
> pgstat_report_wait_start() to take a single integer argument which it
> stores without interpretation. Let's separate the construction of the
> wait event into a separate macro, say make_wait_event(). Then
> existing callers of pgstat_report_wait_start(a, b) will instead do
> pgstat_report_wait_start(make_wait_event(a, b)), but callers that want
> to construct the wait event and push it through a few levels of the
> call stack before advertising it only need to pass one integer. If
> done correctly, this should be cleaner and faster than what you've got
> right now.

Hm, OK.... So ProcSleep() and ProcWaitForSignal() need an additional
argument in the shape of a uint32 wait_event. So we need to change the
shape of WaitLatch&friends to have also just a uint32 as an extra
argument. This has as result to force all the callers of WaitLatch to
use the new routine pgstat_make_wait_event() (renamed it), so pgstat.h
needs to be included where WaitLatch() is called.

And this has as consequence to make the addition of classId in
WaitEventEntries completely useless, because including them has the
advantage to reduce the places where pgstat.h is included, but as
make_wait_event is needed to the wait_event value...

> I am not a huge fan of the "WE_" prefix you've used. It's not
> terrible, but "we" doesn't obviously stand for "wait event" and it's
> also a word itself. I think a little more verbosity would add
> clarity. Maybe we could go with WAIT_EVENT_ instead.

OK. Switched that back. Hopefully you find the result cleaner.
--
Michael

Attachment Content-Type Size
wait-event-set-v11.patch text/x-diff 48.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2016-09-30 05:00:15 Re: Fix checkpoint skip logic on idle systems by tracking LSN progress
Previous Message Thomas Munro 2016-09-30 04:32:58 Re: Renaming of pg_xlog and pg_clog