Re: Background Processes and reporting

From: Vladimir Borodin <root(at)simply(dot)name>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Background Processes and reporting
Date: 2016-03-12 17:33:55
Message-ID: FC188775-FF46-4202-9958-6F0E1D3E0A0C@simply.name
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> 12 марта 2016 г., в 2:45, Andres Freund <andres(at)anarazel(dot)de> написал(а):
>
> On 2016-03-12 02:24:33 +0300, Alexander Korotkov wrote:
>> Idea of individual time measurement of every wait event met criticism
>> because it might have high overhead [1].
>
> Right. And that's actually one of the point which I meant with "didn't
> listen to criticism". There've been a lot of examples, on an off list,
> where taking timings trigger significant slowdowns. Yes, in some
> bare-metal environments, which a coherent tsc, the overhead can be
> low. But that doesn't make it ok to have a high overhead on a lot of
> other systems.

That’s why proposal included GUC for that with a default to turn timings measuring off. I don’t remember any objections against that.

And I’m absolutely sure that a real highload production (which of course doesn’t use virtualization and windows) can’t exist without measuring timings. Oracle guys have written several chapters (!) about that [0]. Long story short, sampling doesn’t give enough precision. I have shown overhead [1] on bare metal linux with high stressed lwlocks worload. BTW Oracle doesn’t give you any ways to turn timings measurement off, even with hidden parameters. All other commercial databases have waits monitoring with timings measurement. Let’s do it and turn it off by default so that all other platforms don’t suffer from it.

[0] http://www.amazon.com/Optimizing-Oracle-Performance-Cary-Millsap/dp/059600527X
[1] http://www.postgresql.org/message-id/EEE78E40-0E48-411A-9F90-CF9339DA9698@simply.name

>
> Just claiming that that's not a problem will only lead to your position
> not being taken serious.
>
>
>> This is really so at least for Windows [2].
>
> Measuring timing overhead for a simplistic workload on a single system
> doesn't mean that. Try doing such a test on a vmware esx virtualized
> windows machine, on a multi-socket server; in a lot of instances you'll
> see two-three orders of magnitude longer average times; with peaks going
> into 4-5 orders of magnitude. And, as sad it is, realistically most
> postgres instances will run in virtualized environments.
>
>
>> But accessing only current values wouldn't be very useful. We
>> anyway need to gather some statistics. Gathering it by sampling would be
>> both more expensive and less accurate for majority of systems. This is why
>> I proposed hooks to make possible platform dependent extensions. Robert
>> rejects hook because he is "not a big fan of hooks as a way of resolving
>> disagreements about the design" [3].
>
> I think I agree with Robert here. Providing hooks into very low level
> places tends to lead to problems in my experience; tight control over
> what happens is often important - I certainly don't want any external
> code to run while we're waiting for an lwlock.
>
>
>> Besides that is actually not design issues but platform issues...
>
> I don't see how that's the case.
>
>
>> Another question is wait parameters. We want to expose wait event with
>> some parameters. Robert rejects that because it *might* add additional
>> overhead [3]. When I proposed to fit something useful into hard-won
>> 4-bytes, Roberts claims that it is "too clever" [4].
>
> I think stopping to treat this as "Robert/EDB vs. pgpro" would be a good
> first step to make progress here.
>
>
> It seems entirely possible to extend the current API in an incremental
> fashion, either allowing to disable the individual pieces, or providing
> sufficient measurements that it's not needed.
>
>
>> So, situation looks like dead-end. I have no idea how to convince Robert
>> about any kind of advanced functionality of wait monitoring to PostgreSQL.
>> I'm thinking about implementing sampling extension over current
>> infrastructure just to make community see that it sucks. Andres, it would
>> be very nice if you have any idea how to move this situation forward.
>
> I've had my share of conflicts with Robert. But if I were in his shoes,
> targeted by this kind of rhetoric, I'd be very tempted to just ignore
> any further arguments from the origin. So I think the way forward is
> for everyone to cool off, and to see how we can incrementally make
> progress from here on.
>
>
>> Another aspect is that EnterpriseDB offers waits monitoring in proprietary
>> fork [5].
>
> So?
>
> Greetings,
>
> Andres Freund
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
May the force be with you…
https://simply.name

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vladimir Borodin 2016-03-12 17:40:06 Re: Background Processes and reporting
Previous Message Tom Lane 2016-03-12 17:32:31 Re: Performance improvement for joins where outer side is unique