Re: [Proposal] Add accumulated statistics for wait event

From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, imai(dot)yoshikazu(at)fujitsu(dot)com, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "yoshikazu_i443(at)live(dot)jp" <yoshikazu_i443(at)live(dot)jp>
Subject: Re: [Proposal] Add accumulated statistics for wait event
Date: 2021-06-14 14:10:32
Message-ID: 20210614161032.60172b5b@firost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Andres, Hi all,

First, thank you for your feedback!

Please find in attachment a patch implementing accumulated wait event stats
only from the backend point of view. As I wrote when I reviewed and rebased the
existing patch, I was uncomfortable with the global approach. I still volunteer
to work/review the original approach is required.

See bellow for comments and some more explanations about what I think might be
improvements over the previous patch.

On Fri, 11 Jun 2021 12:18:07 -0700
Andres Freund <andres(at)anarazel(dot)de> wrote:

> On 2021-06-05 00:53:44 +0200, Jehan-Guillaume de Rorthais wrote:
> > From 88c2779679c5c9625ca5348eec0543daab5ccab4 Mon Sep 17 00:00:00 2001
> > From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
> > Date: Tue, 1 Jun 2021 13:25:57 +0200
> > Subject: [PATCH 1/2] Add pg_stat_waitaccum view.
> >
> > pg_stat_waitaccum shows counts and duration of each wait events.
> > Each backend/backgrounds counts and measures the time of wait event
> > in every pgstat_report_wait_start and pgstat_report_wait_end. They
> > store those info into their local variables and send to Statistics
> > Collector. We can get those info via Statistics Collector.
> >
> > For reducing overhead, I implemented statistic hash instead of
> > dynamic hash. I also implemented track_wait_timing which
> > determines wait event duration is collected or not.
> I object to adding this overhead. The whole selling point for wait
> events was that they are low overhead. I since spent time reducing the
> overhead further, because even just the branches for checking if
> track_activity is enabled are measurable (225a22b19ed).

Agree. The previous patch I rebased was to review it and reopen this discussion,
I even added a small FIXME in pgstat_report_wait_end and
pgstat_report_wait_start about your work:

//FIXME: recent patch to speed up this call.

In the patch in attachment, I tried to fix this by using kind of an internal
hook for pgstat_report_wait_start and pgstat_report_wait_end. This allows to
"instrument" wait events only when required, on the fly, dynamically.

Moreover, I removed the hash structure for a simple static array for faster

> > From ddb1adc5cd9acc9bc9de16d0cf057124b09fe1e3 Mon Sep 17 00:00:00 2001
> > From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
> > Date: Fri, 4 Jun 2021 18:14:51 +0200
> > Subject: [PATCH 2/2] [POC] Change measuring method of wait event time from
> > INSTR_TIME to rdtsc.
> >
> > This patch changes measuring method of wait event time from INSTR_TIME
> > (which uses gettimeofday or clock_gettime) to rdtsc. This might reduce the
> > overhead of measuring overhead.
> >
> > Any supports like changing clock cycle to actual time or error handling are
> > not currently implemented.
> rdtsc is a serializing (*) instruction - that's the expensive part. On linux
> clock_gettime() doesn't actually need a syscall. While the vdso call
> implies a bit of overhead over a raw rdtsc, it's a relatively small part
> of the overhead. See

I choose to remove all this rdtsc part from my patch as this wasn't clear how
much faster it was compare to simpler vdso functions and how to accurately
extract a human time.

About my take on $subject, for the sake of simplicity of this PoC, I added
instrumentation to log_statement_stats. Despite the query context of the
reported log, they are really accumulated stats.

The patch updated pg_stat_get_waitaccum() as well to be able to report the
accumulated wait events from your interactive or batch session.

So using my previous fe-time demo client, you can test it using:

PGOPTIONS="--log_statement_stats=on" ./fe-time 100

From logs, I now have (notice the last line):

LOG: duration: 3837.194 ms statement: SELECT * FROM pgbench_accounts
DETAIL: ! system usage stats:
! 0.087444 s user, 0.002106 s system, 3.837202 s elapsed
! [0.087444 s user, 0.003974 s system total]
! 25860 kB max resident size
! 0/0 [0/0] filesystem blocks in/out
! 0/303 [0/697] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
! 4/18 [5/18] voluntary/involuntary context switches
! Client/ClientWrite 4 calls, 3747102 us elapsed

Using pgbench scale factor 10, the copy query for pgbench_accounts looks like:

LOG: duration: 2388.081 ms statement: copy pgbench_accounts from stdin
DETAIL: ! system usage stats:
! 1.373756 s user, 0.252860 s system, 2.388100 s elapsed
! [1.397015 s user, 0.264951 s system total]
! 37788 kB max resident size
! 0/641584 [0/642056] filesystem blocks in/out
! 194/4147 [195/4728] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
! 3263/92 [3390/102] voluntary/involuntary context switches
! LWLock/WALBufMapping 3 calls, 654 us elapsed
! LWLock/WALWrite 3 calls, 60680 us elapsed
! LWLock/CheckpointerComm 3 calls, 318 us elapsed
! Client/ClientRead 1 calls, 151 us elapsed
! IO/DataFileExtend 16397 calls, 94000 us elapsed
! IO/DataFileWrite 14346 calls, 45911 us elapsed
! IO/WALInitSync 6 calls, 334840 us elapsed
! IO/WALInitWrite 6 calls, 48040 us elapsed
! IO/WALSync 17 calls, 353334 us elapsed
! IO/WALWrite 8362 calls, 38401 us elapsed

Now, I'm on a fence about the user facing interaction. I'll keep thinking about
this and report this week. In the meantime, any feedback about the current
implementation backbone is welcome.



Attachment Content-Type Size
v1-0001-PoC-Trace-wait-events-to-logfile-when-log_stateme.patch text/x-patch 19.3 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Don Seiler 2021-06-14 14:16:39 Re: Estimating HugePages Requirements?
Previous Message Bharath Rupireddy 2021-06-14 14:04:59 Re: A new function to wait for the backend exit after termination