RE: [Proposal] Add accumulated statistics for wait event

From: "imai(dot)yoshikazu(at)fujitsu(dot)com" <imai(dot)yoshikazu(at)fujitsu(dot)com>
To: 'Craig Ringer' <craig(at)2ndquadrant(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Imai Yoshikazu <yoshikazu_i443(at)live(dot)jp>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: [Proposal] Add accumulated statistics for wait event
Date: 2020-02-12 09:40:57
Message-ID: OSBPR01MB46165923CDAD6008A64BE2C9941B0@OSBPR01MB4616.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 12, 2020 at 5:42 AM, Craig Ringer wrote:
> > It seems performance difference is big in case of read only tests. The reason is that write time is relatively longer than the
> > processing time of the logic I added in the patch.
>
> That's going to be a pretty difficult performance hit to justify.
>
> Can we buffer collected wait events locally and spit the buffer to the
> stats collector at convenient moments? We can use a limited buffer
> size with an overflow flag, so we degrade the results rather than
> falling over or forcing excessive stats reporting at inappropriate
times.

IIUC, currently each backend collects wait events locally. When each
backend goes to idle (telling the frontend that it is ready-for-query), it
reports wait event statistics to the stats collector. The interval of each
report is over than PGSTAT_STAT_INTERVAL(default 500ms). Also when
each backend exits, it also does report.

So if we do the read only test with 50 clients, each 50 backend reports
wait events statistics to the stats collector for almost every 500ms. If
that causes performance degradation, we can improve performance by
letting backends to report its statistics, for example, only at the
backend's exit.

(I think I can easily test this by building postgres with setting
PGSTAT_STAT_INTERVAL to a large value >> 500ms.)

> I suggest that this is also a good opportunity to add some more
> tracepoints to PostgreSQL. The wait events facilities are not very
> traceable right now.

Does that mean we will add TRACE_POSTGRESQL_ to every before/after
pgstat_report_wait_start?

> That way we have a zero-overhead-when-unused option that can also be
> used to aggregate the information per-query, per-user, etc.

I see. In that way, we can accomplish no overhead when DTrace is not
enabled and what we can measure is more customizable.

It is also curious how will overhead be if we implement wait events
statistics on DTrace scripts though I can't imagine how it will be because
I haven't used DTrace.

--
Yoshikazu Imai

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hubert Zhang 2020-02-12 10:12:36 Re: Yet another vectorized engine
Previous Message Hubert Zhang 2020-02-12 09:22:52 Re: Print physical file path when checksum check fails