Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: ik(at)postgresql-consulting(dot)com
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)
Date: 2014-10-03 18:39:11
Message-ID: 20141003183911.GI14522@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 3, 2014 at 05:53:59PM +0200, Ilya Kosmodemiansky wrote:
> > What that gives us is almost zero overhead on backends, high
> > reliability, and the ability of the scan daemon to give higher weights
> > to locks that are held longer. Basically, if you just stored the locks
> > you held and released, you either have to add timing overhead to the
> > backends, or you have no timing information collected. By scanning
> > active locks, a short-lived lock might not be seen at all, while a
> > longer-lived lock might be seen by multiple scans. What that gives us
> > is a weighting of the lock time with almost zero overhead. If we want
> > finer-grained lock statistics, we just increase the number of scans per
> > second.
>
> So I could add the function, which will accumulate the data in some
> view/table (with weights etc). How it should be called? From specific
> process? From some existing maintenance process such as autovacuum?
> Should I implement GUC for example lwlock_pull_rate, 0 for off, from 1
> to 10 for 1 to 10 samples pro second?

Yes, that's the right approach. You would implement it as a background
worker process, and a GUC as you described. I assume it would populate
a view like we already do for the pg_stat_ views, and the counters could
be reset somehow. I would pattern it after how we handle the pg_stat_
views.

> > I am assuming almost no one cares about the number of locks, but rather
> > they care about cummulative lock durations.
>
> Oracle and DB2 measure both, cummulative durations and counts.

Well, the big question is whether counts are really useful. You did a
good job of explaining that when you find heavy clog or xlog lock usage
you would adjust your server. What I am unclear about is why you would
adjust your server based on lock _counts_ and not cummulative lock
duration. I don't think we want the overhead of accumulating
information that isn't useful.

> > I am having trouble seeing any other option that has such a good
> > cost/benefit profile.
>
> At least cost. In Oracle documentation clearly stated, that it is all
> about diagnostic convenience, performance impact is significant.

Oh, we don't want to go there then, and I think this approach is a big
win.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-10-03 18:42:24 Re: Fixed xloginsert_locks for 9.4
Previous Message Robert Haas 2014-10-03 18:38:10 Re: test_shm_mq failing on anole (was: Sending out a request for more buildfarm animals?)