Re: Wait events monitoring future development

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>
Cc: "ik(at)postgresql-consulting(dot)com" <ik(at)postgresql-consulting(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Wait events monitoring future development
Date: 2016-08-09 23:09:00
Message-ID: 9eda4c7a-6149-7493-5339-099a787e8cfd@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8/8/16 11:07 PM, Tsunakawa, Takayuki wrote:
> From: pgsql-hackers-owner(at)postgresql(dot)org
>> > If you want to know why people are against enabling this monitoring by
>> > default, above is the reason. What percentage of people do you think would
>> > be willing to take a 10% performance penalty for monitoring like this? I
>> > would bet very few, but the argument above doesn't seem to address the fact
>> > it is a small percentage.
>> >
>> > In fact, the argument above goes even farther, saying that we should enable
>> > it all the time because people will be unwilling to enable it on their own.
>> > I have to question the value of the information if users are not willing
>> > to enable it. And the solution proposed is to force the 10% default overhead
>> > on everyone, whether they are currently doing debugging, whether they will
>> > ever do this level of debugging, because people will be too scared to enable
>> > it. (Yes, I think Oracle took this
>> > approach.)

Lets put this in perspective: there's tons of companies that spend
thousands of dollars per month extra by running un-tuned systems in
cloud environments. I almost called that "waste" but in reality it
should be a simple business question: is it worth more to the company to
spend resources on reducing the AWS bill or rolling out new features?
It's something that can be estimated and a rational business decision made.

Where things become completely *irrational* is when a developer reads
something like "plpgsql blocks with an EXCEPTION handler are more
expensive" and they freak out and spend a bunch of time trying to avoid
them, without even the faintest idea of what that overhead actually is.
More important, they haven't the faintest idea of what that overhead
costs the company, vs what it costs the company for them to spend an
extra hour trying to avoid the EXCEPTION (and probably introducing code
that's far more bug-prone in the process).

So in reality, the only people likely to notice even something as large
as a 10% hit are those that were already close to maxing out their
hardware anyway.

The downside to leaving stuff like this off by default is users won't
remember it's there when they need it. At best, that means they spend
more time debugging something than they need to. At worse, it means they
suffer a production outage for longer than they need to, and that can
easily exceed many months/years worth of the extra cost from the
monitoring overhead.

>> > We can talk about this feature all we want, but if we are not willing to
>> > be realistic in how much performance penalty the _average_ user is willing
>> > to lose to have this monitoring, I fear we will make little progress on
>> > this feature.
> OK, 10% was an overstatement. Anyway, As Amit said, we can discuss the default value based on the performance evaluation before release.
>
> As another idea, we can stand on the middle ground. Interestingly, MySQL also enables their event monitoring (Performance Schema) by default, but not all events are collected. I guess highly encountered events are not collected by default to minimize the overhead.

That's what we currently do with several track_* and log_*_stats GUCs,
several of which I forgot even existed until just now. Since there's
question over the actual overhead maybe that's a prudent approach for
now, but I think we should be striving to enable these things ASAP.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2016-08-09 23:14:22 Re: dsm_unpin_segment
Previous Message Jim Nasby 2016-08-09 22:38:50 Re: dsm_unpin_segment