Re: Admission Control

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Admission Control
Date: 2010-06-25 19:15:59
Message-ID: AANLkTingRgi8PoFRdM0eJVNJpXVosHNdqAK7yUnjM-b9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 25, 2010 at 1:33 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Recent discussions involving the possible benefits of a connection
> pool for certain users has reminded me of a brief discussion at The
> Royal Oak last month, where I said I would post a reference a
> concept which might alleviate the need for external connection
> pools.  For those interested, check out section 2.4 of this
> document:
>
> Joseph M. Hellerstein, Michael Stonebraker and James Hamilton. 2007.
> Architecture of a Database System. Foundations and Trends(R) in
> Databases Vol. 1, No. 2 (2007) 141*259.
>
> http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf

I think a good admission control system for memory would be huge for
us. There are innumerable threads on pgsql-performance where we tell
people to set work_mem to a tiny value (like 4MB or 16MB) because any
higher value risks driving the machine into swap in the event that
they get an unusually large number of connections or those connections
issue queries that require an unusual number of hashes or sorts.
There are also innumerable postings from people complaining that
external sorts are way slower than in-memory sorts, and of course a
hash join frequently mops the floor with a nested loop with inner
index-scan.

A really trivial admission control system might let you set a
system-wide limit on work_mem. As we build a plan, we could estimate
the total amount of work_mem it will require by examining all the
sort, hash, and hash aggregate nodes it contains. In shared memory,
we keep a total of this value across all back-ends. Just before
beginning to execute a plan that uses >0 work_mem, we bump this value
up by the value for the current plan, unless that would make us exceed
the system-wide limit. In that case, we sleep, and then next person
to finish executing (and decrease the value in shared memory) will
wake us up to try again. (Ideally, we'd want to make
maintenance_work_mem part of this accounting process also; otherwise,
a system that was humming along nicely might suddenly start thrashing
when a maintenance operation kicks off.) I suppose this would take a
good deal of performance testing to see how well it actually works. A
refinement might be to try to consider an inferior plan that uses less
memory when the system is tight on memory, rather than waiting. But
you'd have to be careful about that, because waiting might be better
(it's worth waiting 15 s if it means the execution time will decrease
by > 15 s).

The idea of doling out queries to engine processes in an interesting
one, but seems very different than our current query execution model.
I can't even begin to speculate as to whether there's anything
interesting we could do in that area without reading some literature
on the topic - got any pointers? But even if we can't or don't want
to do that, we already know that limiting the number of backends and
round-robining queries among them performs MUCH better that setting
max_connections to a large value, and not just because of memory
exhaustion. Our answer to that problem is "use a connection pooler",
but that's not a very good solution even aside from the possible
administrative nuisance, because it only solves half the problem. In
the general case, the question is not whether we can currently afford
to allocate 0 or 1 backends to a given query, but whether we can
afford to allocate 0, 1, or >1; furthermore, if we can afford to
allocate >1 backend, we'd ideally like to reuse an existing backend
rather than starting a new one. I don't think anyone's going to be
too happy with a parallel query implementation with a dependency on an
external connection poooler.

One of the difficulties in reusing an existing backend for a new
query, or in maintaining a pool of backends that could be used as
workers for parallel queries, is that our current system does not
require us to have, nor do we have, a way of resetting a backend to a
completely pristine state. DISCARD ALL is probably pretty close,
because AIUI that's what connection poolers are using today, and if it
didn't work reasonably well, we'd have gotten complaints. But DISCARD
ALL will not let you rebind to a different database, for example,
which would be a pretty useful thing to do in a pooling environment,
so that you needn't maintain separate pools for each database, and it
doesn't let you restart the authentication cycle (with no backsies)
either. Of course, addressing these problems wouldn't by itself give
us a built-in connection pooler or parallel query execution, and
there's some danger of destabilizing the code, but possibly it would
be a good way to get off the ground. Not sure, though. Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2010-06-25 19:49:40 Re: LLVM / clang
Previous Message Josh Berkus 2010-06-25 19:09:23 Re: testing plpython3u on 9.0beta2