Re: Background Processes and reporting

From: Vladimir Borodin <root(at)simply(dot)name>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Oleg Bartunov <obartunov(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Background Processes and reporting
Date: 2016-03-14 19:54:06
Message-ID: 9FE8342A-5A38-4F75-98F6-D1754FFE6CA1@simply.name
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> 14 марта 2016 г., в 22:21, Robert Haas <robertmhaas(at)gmail(dot)com> написал(а):
>
> On Sat, Mar 12, 2016 at 6:05 AM, Oleg Bartunov <obartunov(at)gmail(dot)com> wrote:
>>> So?
>>
>> So, Robert already has experience with the subject, probably, he has bad
>> experience with edb implementation and he'd like to see something better in
>> community version. That's fair and I accept his position.
>
> Bingo - though maybe "bad" experience is not quite as accurate as
> "could be better".
>
>> Wait monitoring is one of the popular requirement of russian companies, who
>> migrated from Oracle. Overwhelming majority of them use Linux, so I suggest
>> to have configure flag for including wait monitoring at compile time
>> (default is no wait monitoring), or have GUC variable, which is also off by
>> default, so we have zero to minimal overhead of monitoring. That way we'll
>> satisfy many enterprises and help them to choose postgres, will get feedback
>> from production use and have time for feature improving.
>
> So, right now we can only display the wait information in
> pg_stat_activity. There are a couple of other things that somebody
> might want to do:
>
> 1. Sample the wait state information across all backends in the
> system. On a large, busy system, this figures to be quite cheap, and
> the sampling interval could be configurable.
>
> 2. Count every instance of every wait event in every backend, and roll
> that up either via shared memory or additional stats messges.
>
> 3. Like #2, but with timing information.
>
> 4. Like #2, but on a per-query basis, somehow integrated with
> pg_stat_statements.

5. Show extra information about wait event (i.e. exclusive of shared mode for LWLocks, relation/forknum/blknum for I/O operations, etc.).

>
> The challenge with any of these except #1 is that they are going to
> produce a huge volume of data, and, whether you believe it or not, #3
> is going to sometimes be crushingly slow. Really. I tend to think
> that #1 might be better than #2 or #3, but I'm not unwilling to listen
> to contrary arguments, especially if backed up by careful benchmarking
> showing that the performance hit is negligible.

I have already shown [0, 1] the overhead of measuring timings in linux on representative workload. AFAIK, these tests were the only one that showed any numbers. All other statements about terrible performance have been and remain unconfirmed.

As for the size of such information it of course should be configurable. I.e. in Oracle there is a GUC for the size of ring buffer to store history of sampling with extra information about each wait event.

[0] http://www.postgresql.org/message-id/EEE78E40-0E48-411A-9F90-CF9339DA9698@simply.name
[1] http://www.postgresql.org/message-id/5F3DD73A-2A85-44BF-9F47-54049A81C981@simply.name

> My reason for wanting
> to get the stuff we already had committed first is because I have
> found that it is best to proceed with these kinds of problems
> incrementally, not trying to solve too much in a single commit. Now
> that we have the basics, we can build on it, adding more wait events
> and possibly more recordkeeping for the ones we have already - but
> anything that regresses performance for people not using the feature
> is a dead end in my book, as is anything that introduces overall
> stability risks.

Ok, doing it in short steps seems to be a good plan. Any objections against giving people an ability to turn some feature (i.e. notorious measuring timings) even if it makes some performance degradation? Of course, it should be turned off by default.

>
> I think the way forward from here is that Postgres Pro should (a)
> rework their implementation to work with what has already been
> committed, (b) consider carefully whether they've done everything
> possible to contain the performance loss, (c) benchmark it on several
> different machines and workloads to see how much performance loss
> there is, and (d) stop accusing me of acting in bad faith.

If anything, I’m not from PostgresPro and I’m not «accusing you». But to be honest current committed implementation has been tested exactly on one machine with two workloads. And I think, it is somehow unfair to demand more from others. Although it doesn’t mean that testing on exactly one machine with only one OS is enough, of course. I suppose, you should ask the authors to test it on some representative hardware and workload but if authors don’t have them, it would be nice to help them with that.

Also it would be really interesting to hear your opinion about the initial Andres’s question. Any thoughts about changing current committed implementation?

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
May the force be with you…
https://simply.name

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul Ramsey 2016-03-14 19:56:14 Re: Parallel Aggregate
Previous Message Robert Haas 2016-03-14 19:53:43 Re: Parallel Aggregate