Re: [Proposal] Add accumulated statistics for wait event

From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, imai(dot)yoshikazu(at)fujitsu(dot)com, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "yoshikazu_i443(at)live(dot)jp" <yoshikazu_i443(at)live(dot)jp>
Subject: Re: [Proposal] Add accumulated statistics for wait event
Date: 2021-06-14 14:10:32
Message-ID: 20210614161032.60172b5b@firost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Andres, Hi all,

First, thank you for your feedback!

Please find in attachment a patch implementing accumulated wait event stats
only from the backend point of view. As I wrote when I reviewed and rebased the
existing patch, I was uncomfortable with the global approach. I still volunteer
to work/review the original approach is required.

See bellow for comments and some more explanations about what I think might be
improvements over the previous patch.

On Fri, 11 Jun 2021 12:18:07 -0700
Andres Freund <andres(at)anarazel(dot)de> wrote:

> On 2021-06-05 00:53:44 +0200, Jehan-Guillaume de Rorthais wrote:
> > From 88c2779679c5c9625ca5348eec0543daab5ccab4 Mon Sep 17 00:00:00 2001
> > From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
> > Date: Tue, 1 Jun 2021 13:25:57 +0200
> > Subject: [PATCH 1/2] Add pg_stat_waitaccum view.
> >
> > pg_stat_waitaccum shows counts and duration of each wait events.
> > Each backend/backgrounds counts and measures the time of wait event
> > in every pgstat_report_wait_start and pgstat_report_wait_end. They
> > store those info into their local variables and send to Statistics
> > Collector. We can get those info via Statistics Collector.
> >
> > For reducing overhead, I implemented statistic hash instead of
> > dynamic hash. I also implemented track_wait_timing which
> > determines wait event duration is collected or not.
>
> I object to adding this overhead. The whole selling point for wait
> events was that they are low overhead. I since spent time reducing the
> overhead further, because even just the branches for checking if
> track_activity is enabled are measurable (225a22b19ed).

Agree. The previous patch I rebased was to review it and reopen this discussion,
I even added a small FIXME in pgstat_report_wait_end and
pgstat_report_wait_start about your work:

//FIXME: recent patch to speed up this call.

In the patch in attachment, I tried to fix this by using kind of an internal
hook for pgstat_report_wait_start and pgstat_report_wait_end. This allows to
"instrument" wait events only when required, on the fly, dynamically.

Moreover, I removed the hash structure for a simple static array for faster
access.

> > From ddb1adc5cd9acc9bc9de16d0cf057124b09fe1e3 Mon Sep 17 00:00:00 2001
> > From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
> > Date: Fri, 4 Jun 2021 18:14:51 +0200
> > Subject: [PATCH 2/2] [POC] Change measuring method of wait event time from
> > INSTR_TIME to rdtsc.
> >
> > This patch changes measuring method of wait event time from INSTR_TIME
> > (which uses gettimeofday or clock_gettime) to rdtsc. This might reduce the
> > overhead of measuring overhead.
> >
> > Any supports like changing clock cycle to actual time or error handling are
> > not currently implemented.
>
> rdtsc is a serializing (*) instruction - that's the expensive part. On linux
> clock_gettime() doesn't actually need a syscall. While the vdso call
> implies a bit of overhead over a raw rdtsc, it's a relatively small part
> of the overhead. See
> https://www.postgresql.org/message-id/20200612232810.f46nbqkdhbutzqdg%40alap3.anarazel.de

I choose to remove all this rdtsc part from my patch as this wasn't clear how
much faster it was compare to simpler vdso functions and how to accurately
extract a human time.

About my take on $subject, for the sake of simplicity of this PoC, I added
instrumentation to log_statement_stats. Despite the query context of the
reported log, they are really accumulated stats.

The patch updated pg_stat_get_waitaccum() as well to be able to report the
accumulated wait events from your interactive or batch session.

So using my previous fe-time demo client, you can test it using:

PGOPTIONS="--log_statement_stats=on" ./fe-time 100

From logs, I now have (notice the last line):

LOG: duration: 3837.194 ms statement: SELECT * FROM pgbench_accounts
LOG: QUERY STATISTICS
DETAIL: ! system usage stats:
! 0.087444 s user, 0.002106 s system, 3.837202 s elapsed
! [0.087444 s user, 0.003974 s system total]
! 25860 kB max resident size
! 0/0 [0/0] filesystem blocks in/out
! 0/303 [0/697] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
! 4/18 [5/18] voluntary/involuntary context switches
! Client/ClientWrite 4 calls, 3747102 us elapsed

Using pgbench scale factor 10, the copy query for pgbench_accounts looks like:

LOG: duration: 2388.081 ms statement: copy pgbench_accounts from stdin
LOG: QUERY STATISTICS
DETAIL: ! system usage stats:
! 1.373756 s user, 0.252860 s system, 2.388100 s elapsed
! [1.397015 s user, 0.264951 s system total]
! 37788 kB max resident size
! 0/641584 [0/642056] filesystem blocks in/out
! 194/4147 [195/4728] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
! 3263/92 [3390/102] voluntary/involuntary context switches
! LWLock/WALBufMapping 3 calls, 654 us elapsed
! LWLock/WALWrite 3 calls, 60680 us elapsed
! LWLock/CheckpointerComm 3 calls, 318 us elapsed
! Client/ClientRead 1 calls, 151 us elapsed
! IO/DataFileExtend 16397 calls, 94000 us elapsed
! IO/DataFileWrite 14346 calls, 45911 us elapsed
! IO/WALInitSync 6 calls, 334840 us elapsed
! IO/WALInitWrite 6 calls, 48040 us elapsed
! IO/WALSync 17 calls, 353334 us elapsed
! IO/WALWrite 8362 calls, 38401 us elapsed

Now, I'm on a fence about the user facing interaction. I'll keep thinking about
this and report this week. In the meantime, any feedback about the current
implementation backbone is welcome.

Thanks!

Regards,

Attachment Content-Type Size
v1-0001-PoC-Trace-wait-events-to-logfile-when-log_stateme.patch text/x-patch 19.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Don Seiler 2021-06-14 14:16:39 Re: Estimating HugePages Requirements?
Previous Message Bharath Rupireddy 2021-06-14 14:04:59 Re: A new function to wait for the backend exit after termination