Re: RFC: replace pg_stat_activity.waiting with something more descriptive

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: ik(at)postgresql-consulting(dot)com
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: RFC: replace pg_stat_activity.waiting with something more descriptive
Date: 2015-06-25 11:49:44
Message-ID: CAA4eK1Lps_FOVoO0pHSHokvVhGcUeeE2rsT5BewzjRncfp9O-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 25, 2015 at 4:28 PM, Ilya Kosmodemiansky <
ilya(dot)kosmodemiansky(at)postgresql-consulting(dot)com> wrote:
>
> On Thu, Jun 25, 2015 at 12:37 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> > 2. Add 2 new columns to pg_stat_activity
> > waiting_resource - true for waits other heavy wait locks, false
> > otherwise
> > wait_event - description code for the wait event
> >
> > 3. Add new view 'pg_stat_wait_event' with following info:
> > pid - process id of this backend
> > waiting - true for any form of wait, false otherwise
> > wait_event_type - Heavy Weight Lock, Light Weight Lock, I/O wait, etc
> > wait_event - Lock (Relation), Lock (Relation Extension), etc
>
> Personally I think, that tracking waits is a not a good idea for
> pg_stat_activity (at least in that straight-forward manner).

As mentioned in the initial mail by Robert, that sometimes system becomes
slow (either due to contention on various kinds of locks or due to I/O or
due
to some other such reasons) that such kind of handy information via some
view is quite useful. Recently while working on one of the
performance/scalability
projects, I need to use gdb to attach to different processes to see what
they
are doing (of course one can use perf or some other utilities as well) and I
found most of them were trying to wait on some LW locks, now having such
an information available via view could be really useful, because sometimes
at customer sites, we can't use gdb or perf to see what's going on.

> One
> process can wait for lots of things between 2 sampling of
> pg_stat_activity and that sampling can be pretty useless.
>

Yeah, that's right and I am not sure if we should bother about such
scenario's
as the system is generally fine in such situations, however there are other
cases where we can find most of the backends are waiting on one or other
thing.

> My approach (about which Ive had a talk mentioned by Jim and which I
> hope to finalize and submit within a few days) is a bit different and
> I believe is more useful:
>
> 1. Some sort of histogram of top waits within entire database by pid.
> That will be an approximate one, because I hardly believe there is a
> possibility to make a precise one without significant overhead.
>
> 2. Some cyclic buffer of more precise wait statistic inside each
> worker. Sampling may be turned on if we see some issues in histogram
> (1) and want to have some more details.
>

I think this is some what different kind of utility which can give us
aggregated information and I think this will address different kind of
usecase and will have somewhat more complex design and it doesn't
look impossible to use part of what will be developed as part of this
proposal.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-06-25 12:03:39 Re: Should we back-patch SSL renegotiation fixes?
Previous Message Amit Kapila 2015-06-25 11:35:22 Re: RFC: replace pg_stat_activity.waiting with something more descriptive