Re: System load consideration before spawning parallel workers

From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: System load consideration before spawning parallel workers
Date: 2016-09-01 17:01:35
Message-ID: aca21e2c-746d-36aa-103a-275ce24bb395@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8/16/16 3:39 AM, Haribabu Kommi wrote:
> Yes, we need to consider many parameters as a system load, not just only
> the CPU. Here I attached a POC patch that implements the CPU load
> calculation and decide the number of workers based on the available CPU
> load. The load calculation code is not an optimized one, there are many ways
> that can used to calculate the system load. This is just for an example.

I see a number of discussion points here:

We don't yet have enough field experience with the parallel query
facilities to know what kind of use patterns there are and what systems
for load management we need. So I think building a highly specific
system like this seems premature. We have settings to limit process
numbers, which seems OK as a start, and those knobs have worked
reasonably well in other areas (e.g., max connections, autovacuum). We
might well want to enhance this area, but we'll need more experience and
information.

If we think that checking the CPU load is a useful way to manage process
resources, why not apply this to more kinds of processes? I could
imagine that limiting connections by load could be useful. Parallel
workers is only one specific niche of this problem.

As I just wrote in another message in this thread, I don't trust system
load metrics very much as a gatekeeper. They are reasonable for
long-term charting to discover trends, but there are numerous potential
problems for using them for this kind of resource control thing.

All of this seems very platform specific, too. You have
Windows-specific code, but the rest seems very Linux-specific. The
dstat tool I had never heard of before. There is stuff with cgroups,
which I don't know how portable they are across different Linux
installations. Something about Solaris was mentioned. What about the
rest? How can we maintain this in the long term? How do we know that
these facilities actually work correctly and not cause mysterious problems?

There is a bunch of math in there that is not documented much. I can't
tell without reverse engineering the code what any of this is supposed
to do.

My suggestion is that we focus on refining the process control numbers
that we already have. We had extensive discussions about that during
9.6 beta. We have related patches in the commit fest right now. Many
ideas have been posted. System admins are generally able to count their
CPUs and match that to the number of sessions and jobs they need to run.
Everything beyond that could be great but seems premature before we
have the basics figured out.

Maybe a couple of hooks could be useful to allow people to experiment
with this. But the hooks should be more general, as described above.
But I think a few GUC settings that can be adjusted at run time could be
sufficient as well.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabrízio de Royes Mello 2016-09-01 17:06:01 Re: PostgreSQL 10 kick-off
Previous Message Pavel Stehule 2016-09-01 16:52:32 Re: new gcc warning