Re: Background Processes and reporting

From: Vladimir Borodin <root(at)simply(dot)name>
To: obartunov(at)gmail(dot)com
Cc: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Background Processes and reporting
Date: 2016-03-15 20:41:53
Message-ID: E9CCE450-4573-4373-B212-F0B0CDC6BE5B@simply.name
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> 15 марта 2016 г., в 19:57, Oleg Bartunov <obartunov(at)gmail(dot)com> написал(а):
>
>
>
> On Tue, Mar 15, 2016 at 7:43 PM, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru <mailto:a(dot)korotkov(at)postgrespro(dot)ru>> wrote:
> On Tue, Mar 15, 2016 at 12:57 AM, Robert Haas <robertmhaas(at)gmail(dot)com <mailto:robertmhaas(at)gmail(dot)com>> wrote:
> On Mon, Mar 14, 2016 at 4:42 PM, Andres Freund <andres(at)anarazel(dot)de <mailto:andres(at)anarazel(dot)de>> wrote:
> > On 2016-03-14 16:16:43 -0400, Robert Haas wrote:
> >> > I have already shown [0, 1] the overhead of measuring timings in linux on
> >> > representative workload. AFAIK, these tests were the only one that showed
> >> > any numbers. All other statements about terrible performance have been and
> >> > remain unconfirmed.
> >>
> >> Of course, those numbers are substantial regressions which would
> >> likely make it impractical to turn this on on a heavily-loaded
> >> production system.
> >
> > A lot of people operating production systems are fine with trading a <=
> > 10% impact for more insight into the system; especially if that
> > configuration can be changed without a restart. I know a lot of systems
> > that use pg_stat_statements, track_io_timing = on, etc; just to get
> > that. In fact there's people running perf more or less continuously in
> > production environments; just to get more insight.
> >
> > I think it's important to get as much information out there without
> > performance overhead, so it can be enabled by default. But I don't think
> > it makes sense to not allow features in that cannot be enabled by
> > default, *if* we tried to make them cheap enough beforehand.
>
> Hmm, OK. I would have expected you to be on the other side of this
> question, so maybe I'm all wet. One point I am concerned about is
> that, right now, we have only a handful of types of wait events. I'm
> very interested in seeing us add more, like I/O and client wait. So
> any overhead we pay here is likely to eventually be paid in a lot of
> places - thus it had better be extremely small.
>
> OK. Let's start to produce light, not heat.
>
> As I get we have two features which we suspect to introduce overhead:
> 1) Recording parameters of wait events which requires some kind of synchronization protocol.
> 2) Recording time of wait events because time measurements might be expensive on some platforms.
>
> Simultaneously there are machines and workloads where both of these features doesn't produce measurable overhead. And, we're talking not about toy databases. Vladimir is DBA from Yandex which is in TOP-20 (by traffic) internet companies in the world. They do run both of this features in production highload database without noticing any overhead of them.
>
> It would be great progress, if we decide that we could add both of these features controlled by GUC (off by default).
>
> enable_waits_statistics ?
>
>
> If we decide so, then let's start working on this. At first, we should construct list of machines and workloads for testing. Each list of machines and workloads would be not comprehensive. But let's find something that would be enough for testing of GUC controlled, off by default features. Then we can turn our conversation from theoretical thoughts to particular benchmarks which would be objective and convincing to everybody.
>
> Vladimir, could you provide a test suite, so other people could measure overhead on their machines ?

I have somehow described it here [0]. Since the majority of concerns were around LWLocks, the plan was to reconstruct a workload under heavy LWLocks pressure. This can easily be done even with pgbench in two following scenarios:
1. Put all the data in shared buffers and on tmpfs and run read/write test. Contention would be around ProcArrayLock.
2. Put all the data in RAM but not all in shared buffers and run read-only test. Contention would be around buffer manager.

IMHO, these two tests are good to be representative and not depend much on hardware.

[0] http://www.postgresql.org/message-id/EEE78E40-0E48-411A-9F90-CF9339DA9698@simply.name

>
>
>
>
> Otherwise, let's just add these features to the list of unwanted functionality and close this question.
>
> ------
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com <http://www.postgrespro.com/>
> The Russian Postgres Company

--
May the force be with you…
https://simply.name

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2016-03-15 20:43:15 Re: RFC: replace pg_stat_activity.waiting with something more descriptive
Previous Message Álvaro Hernández Tortosa 2016-03-15 20:38:26 Re: Soliciting Feedback on Improving Server-Side Programming Documentation