Re: pg_stat_activity.waiting_start

From: Joel Jacobson <joel(at)trustly(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_stat_activity.waiting_start
Date: 2016-12-25 16:30:00
Message-ID: CAASwCXeEd5mqtq15gww5jQWREA+np3zxFgm5SpTJBkp_3eW5vg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Dec 25, 2016 at 8:01 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On December 25, 2016 1:21:43 AM GMT+01:00, Joel Jacobson <joel(at)trustly(dot)com> wrote:
>
>>Is it really a typical real-life scenario that processes can be
>>waiting extremely often for extremely short periods of time,
>>where the timing overhead would be significant?
>
> Yes. Consider WAL insertion, procarray or other similar contended locks.

Ah, I see, then I understand it has to be blazingly fast.

Maybe a good tradeoff then would be to let "wait_start" represent the
very first time the txn started waiting?
That way gettimeofday() would only be called once per txn, and the
value would be remembered, but not exposed when the txn is not
waiting.
If the txn is waiting/not waiting multiple times during it's
life-time, the same "wait_start" value would be exposed when it's
waiting, and NULL when it's not. Sounds good?

As long as the documentation is clear on "wait_start" meaning when the
first wait started in the txn, I think that's useful enough to improve
the situation, as one could then ask a query like "select all
processes that have possibly been waiting for at least 5 seconds",
which you cannot do today.

The best you can do today is ask a query like "select all processes
that are waiting and have been running for at least 5 seconds", but
during those 5 seconds they have been running, they might only have
been waiting for the very last few milliseconds, which might not be a
problem at all. If instead knowing they were waiting 5 seconds ago,
and are still waiting, but might have had periods in between where
they were not waiting, I would say that is close enough to what I as a
user want to know, and can use that information for automatic
decision-making on e.g. if I want to terminate other blocking
processes.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jaime Casanova 2016-12-26 06:19:33 Re: Patch: Write Amplification Reduction Method (WARM)
Previous Message Craig Ringer 2016-12-25 13:58:34 Re: Cluster wide option to control symbol case folding