Re: Wait events monitoring future development

From: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>
Cc: "ik(at)postgresql-consulting(dot)com" <ik(at)postgresql-consulting(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Wait events monitoring future development
Date: 2016-08-10 04:51:41
Message-ID: 0A3221C70F24FB45833433255569204D1F5C0889@G01JPEXMBYT05
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: pgsql-hackers-owner(at)postgresql(dot)org
> Lets put this in perspective: there's tons of companies that spend thousands
> of dollars per month extra by running un-tuned systems in cloud environments.
> I almost called that "waste" but in reality it should be a simple business
> question: is it worth more to the company to spend resources on reducing
> the AWS bill or rolling out new features?
> It's something that can be estimated and a rational business decision made.
>
> Where things become completely *irrational* is when a developer reads
> something like "plpgsql blocks with an EXCEPTION handler are more expensive"
> and they freak out and spend a bunch of time trying to avoid them, without
> even the faintest idea of what that overhead actually is.
> More important, they haven't the faintest idea of what that overhead costs
> the company, vs what it costs the company for them to spend an extra hour
> trying to avoid the EXCEPTION (and probably introducing code that's far
> more bug-prone in the process).
>
> So in reality, the only people likely to notice even something as large
> as a 10% hit are those that were already close to maxing out their hardware
> anyway.
>
> The downside to leaving stuff like this off by default is users won't
> remember it's there when they need it. At best, that means they spend more
> time debugging something than they need to. At worse, it means they suffer
> a production outage for longer than they need to, and that can easily exceed
> many months/years worth of the extra cost from the monitoring overhead.

I'd rather like this way of positive thinking. It will be better to think of the event monitoring as a positive feature for (daily) proactive improvement, not only as a debugging feature which gives negative image. For example, pgAdmin4 can display 10 most time-consuming events and their solutions. The DBA initially places the database and WAL on the same volume. As the system grows and the write workload increases, the DBA can get a suggestion from pgAdmin4 that he can prepare for the system growth by placing WAL on another volume to reduce WALWriteLock wait events. This is not debugging, but proactive monitoring.

> > As another idea, we can stand on the middle ground. Interestingly, MySQL
> also enables their event monitoring (Performance Schema) by default, but
> not all events are collected. I guess highly encountered events are not
> collected by default to minimize the overhead.
>
> That's what we currently do with several track_* and log_*_stats GUCs,
> several of which I forgot even existed until just now. Since there's question
> over the actual overhead maybe that's a prudent approach for now, but I
> think we should be striving to enable these things ASAP.

Agreed. And as Bruce said, it may be better to be able to disable collection of some events that have visible impact on performance.

Regards
Takayuki Tsunakawa

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-08-10 05:24:48 Re: Small issues in syncrep.c
Previous Message Michael Paquier 2016-08-10 04:41:30 Re: multivariate statistics (v19)