Re: Wait events monitoring future development

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, "ik(at)postgresql-consulting(dot)com" <ik(at)postgresql-consulting(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Wait events monitoring future development
Date: 2016-08-10 20:39:00
Message-ID: CA+TgmobekAmfvPseUaFCHuNa6RGvnZT+fFuJaa7qKvMUV-CL-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 9, 2016 at 12:07 AM, Tsunakawa, Takayuki
<tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com> wrote:
> As another idea, we can stand on the middle ground. Interestingly, MySQL also enables their event monitoring (Performance Schema) by default, but not all events are collected. I guess highly encountered events are not collected by default to minimize the overhead.

Yes, I think that's a sensible approach. I can't see enabling by
default a feature that significantly regresses performance. We work
too hard to improve performance to throw very much of it away for any
one feature, even a feature that a lot of people like. What I really
like about what got committed to 9.6 is that it's so cheap we should
be able to use for lots of other things - latch events, network I/O,
disk I/O, etc. without hurting performance at all.

But if we start timing those events, it's going to be really
expensive. Even just counting them or keeping a history will cost a
lot more than just publishing them while they're active, which is what
we're doing now.

> BTW, I remember EnterpriseDB has a wait event monitoring feature. Is it disabled by default? What was the overhead?

Timed events in Advanced Server are disabled by default. I haven't
actually tested the overhead myself and I don't remember exactly what
the numbers were the last time someone else did, but I think if you
turned edb_timed_statistics on, it's pretty expensive. If we can
agree on something sensible here, I imagine we'll get rid of that
feature in Advanced Server in favor of whatever the community settles
on. But if the community agrees to turn on something by default that
costs a measurable percentage in performance, I predict that Advanced
Server 10 will ship with a different default for that feature than
PostgreSQL 10.

Personally, I think too much of this thread (and previous threads) is
devoted to arguing about whether it's OK to make performance worse,
and by how much we'd be willing to make it worse. What I think we
ought to be talking about is how to design a feature that produces the
most useful data for the least performance cost possible, like by
avoiding measuring wait times for events that are very frequent or
waits that are very short. Or, maybe we could have a background
process that updates a timestamp in shared memory every millisecond,
and other processes can read that value instead of making a system
call. I think on Linux systems with fast clocks the operating system
basically does something like that for you, but there might be other
systems where it helps. Of course, it could also skew the results if
the system is so overloaded that the clock-updater process gets
descheduled for a lengthy period of time.

Anyway, I disagree with the idea that this feature is stalled or
blocked in some way. I (and quite a few other people, though not
everyone) oppose making performance significantly worse in the default
configuration. I oppose that regardless of whether it is a
hypothetical patch for this feature that causes the problem or whether
it is a hypothetical patch for some other feature that causes the
problem. I am not otherwise opposed to more work in this area; in
fact, I'm rather in favor of it. But you can count on me to argue
against pretty much everything that causes a performance regression,
whatever the reason. Virtually every release, at least one developer
proposes some patch that slows the server down by "only" 1-2%. If
we'd accepted all of the patches that were shot down because of such
impacts, we'd have lost a very big chunk of performance between the
time I started working on PostgreSQL and now.

As it is, our single-threaded performance seems to have regressed
noticeably since 9.1:

http://bonesmoses.org/2016/01/08/pg-phriday-how-far-weve-come/

I think that's awful. But if we'd accepted all of those patches that
cost "only" one or two percentage points, it would probably be -15% or
-25% rather than -4.4%. I think that if we want to really be
successful as a project, we need to make that number go UP, not down.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Palle Girgensohn 2016-08-10 20:42:01 Improved ICU patch - WAS: Implementing full UTF-8 support (aka supporting 0x00)
Previous Message Vladimir Sitnikov 2016-08-10 20:37:28 Re: Slowness of extended protocol