Re: RFC: replace pg_stat_activity.waiting with something more descriptive

From: Thom Brown <thom(at)linux(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Vladimir Borodin <root(at)simply(dot)name>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: RFC: replace pg_stat_activity.waiting with something more descriptive
Date: 2016-03-04 13:35:44
Message-ID: CAA-aLv5RONbDSyG2mnpEx8dJ11TXvmqO9GPB6Nncvcdu2fcegA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4 March 2016 at 04:05, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Wed, Feb 24, 2016 at 7:14 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>
>> On Mon, Feb 22, 2016 at 10:05 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> wrote:
>> >> I wouldn't bother tinkering with it at this point. The value isn't
>> >> going to be recorded on disk anywhere, so it will be easy to change
>> >> the way it's computed in the future if we ever need to do that.
>> >>
>> >
>> > Okay. Find the rebased patch attached with this mail. I have moved
>> > this patch to upcoming CF.
>>
>> I would call the functions pgstat_report_wait_start() and
>> pgstat_report_wait_end() instead of pgstat_report_start_waiting() and
>> pgstat_report_end_waiting().
>>
>
> Changed as per suggestion and made these functions inline.
>
>> I think pgstat_get_wait_event_type should not return HWLock, a term
>> that appears nowhere in our source tree at present. How about just
>> "Lock"?
>>
>
> Changed as per suggestion.
>
>> I think the wait event types should be documented - and the wait
>> events too, perhaps.
>>
>
> As discussed upthread, I have added documentation for all the possible wait
> events and an example. Some of the LWLocks like AsyncQueueLock and
> AsyncCtlLock are used for quite similar purpose, so I have kept there
> explanation as same.
>
>> Maybe it's worth having separate wait event type names for lwlocks and
>> lwlock tranches. We could report LWLockNamed and LWLockTranche and
>> document the difference: "LWLockNamed indicates that the backend is
>> waiting for a specific, named LWLock. The event name is the name of
>> that lock. LWLockTranche indicates that the backend is waiting for
>> any one of a group of locks with similar function. The event name
>> identifies the general purpose of locks in that group."
>>
>
> Changed as per suggestion.
>
>> There's no requirement that every session have every tranche
>> registered. I think we should consider displaying "extension" for any
>> tranche that's not built-in, or at least for tranches that are not
>> registered (rather than "unknown wait event").
>>
>> + if (lock->tranche == 0 && lockId < NUM_INDIVIDUAL_LWLOCKS)
>>
>> Isn't the second part automatically true at this point?
>>
>
> Fixed.
>
>> The changes to LockBufferForCleanup() don't look right to me. Waiting
>> for a buffer pin might be a reasonable thing to define as a wait
>> event, but it shouldn't reported as if we were waiting on the LWLock
>> itself.
>>
>
> As discussed upthread, added a new wait event BufferPin for this case.
>
>> What happens if an error is thrown while we're in a wait?
>>
>
> As discussed upthread, added in AbortTransaction and from where ever
> LWLockReleaseAll() gets called, point to note is that we can call this
> function only when we are sure there is no further possibility of wait on
> LWLock.
>
>> Does this patch hurt performance?
>>
>
> Performance tests are underway.

I've attached a revised version of the patch with the following corrections:

+ <para>
+ <literal>LWLockTranche</>: The backend is waiting for any one of a
+ group of locks with similar function. The <literal>wait_event</>
+ name for this type of wait identifies the general purpose of locks
+ in that group.
+ </para>

s/with similar/with a similar/

+ <row>
+ <entry><literal>ControlFileLock</></entry>
+ <entry>A server process is waiting to read or update the control file
+ or creation of a new WAL log file.</entry>
+ </row>

As the L in WAL stands for "log" anyway, I think the extra "log" word
can be removed.

+ <row>
+ <entry><literal>RelCacheInitLock</></entry>
+ <entry>A server process is waiting to read or write to relation cache
+ initialization file.</entry>
+ </row>

s/to relation/to the relation/

+ <row>
+ <entry><literal>BtreeVacuumLock</></entry>
+ <entry>A server process is waiting to read or update vacuum related
+ information for Btree index.</entry>
+ </row>

s/vacuum related/vacuum-related/
s/for Btree/for a Btree/

+ <row>
+ <entry><literal>AutovacuumLock</></entry>
+ <entry>A server process which could be autovacuum worker is waiting to
+ update or read current state of autovacuum workers.</entry>
+ </row>

s/could be autovacuum/could be that an autovacuum/
s/read current/read the current/

(discussed with Amit offline about other sources of wait, and he
suggested autovacuum launcher, so I've added that in too)

+ <row>
+ <entry><literal>AutovacuumScheduleLock</></entry>
+ <entry>A server process is waiting on another process to ensure that
+ the table it has selected for vacuum still needs vacuum.
+ </entry>
+ </row>

s/for vacuum/for a vacuum/
s/still needs vacuum/still needs vacuuming/

+ <row>
+ <entry><literal>SyncScanLock</></entry>
+ <entry>A server process is waiting to get the start location of scan
+ on table for synchronized scans.</entry>
+ </row>

s/of scan/of a scan/
s/on table/on a table/

+ <row>
+ <entry><literal>SerializableFinishedListLock</></entry>
+ <entry>A server process is waiting to access list of finished
+ serializable transactions.</entry>
+ </row>

s/to access list/to access the list/

+ <row>
+ <entry><literal>SerializablePredicateLockListLock</></entry>
+ <entry>A server process is waiting to perform operation on list of
+ locks held by serializable transactions.</entry>
+ </row>

s/perform operation/perform an operation/
s/on list/on a list/

+ <row>
+ <entry><literal>AutoFileLock</></entry>
+ <entry>A server process is waiting to update
<filename>postgresql.auto.conf</>
+ file.</entry>
+ </row>

s/to update/to update the/

+ <row>
+ <entry><literal>CommitTsLock</></entry>
+ <entry>A server process is waiting to read or update the last value
+ set for transaction timestamp.</entry>
+ </row>

s/for transaction/for the transaction/

+ <row>
+ <entry><literal>clog</></entry>
+ <entry>A server process is waiting on any one of the clog buffer locks
+ to read or write the clog page in pg_clog subdirectory.</entry>
+ </row>

s/page in/page in the/

The 6 rows that follow that one could apply this same correction.

+ <row>
+ <entry><literal>wal_insert</></entry>
+ <entry>A server process is waiting on any one of the wal_insert locks
+ to write data in wal buffers.</entry>
+ </row>

s/data in/data into/

+ <row>
+ <entry><literal>buffer_content</></entry>
+ <entry>A server process is waiting on any one of the buffer_content
+ locks to read or write data in shared buffers.</entry>
+ </row>

s/data in/data into/

+ <row>
+ <entry><literal>buffer_io</></entry>
+ <entry>A server process is waiting on any one of the buffer_io locks
+ to allow other process to complete the I/O on buffer.</entry>
+ </row>

s/allow other process/allow another process/

+ <row>
+ <entry><literal>buffer_mapping</></entry>
+ <entry>A server process is waiting on any one of the buffer_mapping
+ locks to associate a data block with buffer in buffer pool.</entry>
+ </row>

s/with buffer/with a buffer/
s/in buffer pool/in the buffer pool/

+ <row>
+ <entry><literal>lock_manager</></entry>
+ <entry>A server process is waiting on any one of the
lock_manager locks
+ to add or examine locks for backends or waiting on joining/exiting
+ parallel group to perform parallel query.</entry>
+ </row>

s/exiting parallel/exiting a parallel/
s/perform parallel/perform a parallel/

+ <row>
+ <entry><literal>relation</></entry>
+ <entry>A server process is waiting to acquire lock on a
relation.</entry>
+ </row>

s/acquire lock/acquire a lock/

+ <row>
+ <entry><literal>tuple</></entry>
+ <entry>A server process is waiting to acquire a lock on tuple.</entry>
+ </row>

s/on tuple/on a tuple/

+ <row>
+ <entry><literal>virtualxid</></entry>
+ <entry>A server process is waiting to acquire virtual xid
lock.</entry>
+ </row>

s/acquire virtual/acquire a virtual/

The 5 rows that follow the above one can apply a similar substitution
(i.e. s/acquire/acquire a/ )

+ <para>
+ For tranches registered by extensions, the name is specified by extension
+ and the same will be displayed as <structfield>wait_event</>. It is quite
+ possible that user has registered tranche in one of the backends (by
+ having allocation in dynamic shared memory) in which case other backends
+ won't have that information, so we display <literal>extension</> for such
+ cases.
+ </para>

s/and the same will/and this will/
s/has registered tranche/has registered the tranche/

Regards

Thom

Attachment Content-Type Size
extend_pg_stat_activity_v12.patch binary/octet-stream 56.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2016-03-04 13:41:36 Re: RFC: replace pg_stat_activity.waiting with something more descriptive
Previous Message Amit Kapila 2016-03-04 13:20:37 Re: RFC: replace pg_stat_activity.waiting with something more descriptive