Re: Scalability in postgres

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Scott Carey <scott(at)richrelevance(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Dimitri <dimitrik(dot)fr(at)gmail(dot)com>, Flavio Henrique Araque Gurgel <flavio(at)4linux(dot)com(dot)br>, Fabrix <fabrixio1(at)gmail(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, James Mansion <james(at)mansionfamily(dot)plus(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Scalability in postgres
Date: 2009-06-04 10:57:57
Message-ID: 603c8f070906040357g7f99c103tc29433e2f04d7390@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Jun 3, 2009 at 5:09 PM, Scott Carey <scott(at)richrelevance(dot)com> wrote:
> On 6/3/09 11:39 AM, "Robert Haas" <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Jun 3, 2009 at 2:12 PM, Scott Carey <scott(at)richrelevance(dot)com> wrote:
>>> Postgres could fix its connection scalability issues -- that is entirely
>>> independent of connection pooling.
>>
>> Really?  I'm surprised.  I thought the two were very closely related.
>> Could you expand on your thinking here?
>
> They are closely related only by coincidence of Postgres' flaws.
> If Postgres did not scale so poorly as idle connections increase (or as
> active ones increased), they would be rarely needed at all.
>
> Most connection pools in clients (JDBC, ODBC, for example) are designed to
> limit the connection create/close count, not the number of idle connections.
> They reduce creation/deletion specifically by leaving connections idle for a
> while to allow re-use. . .
>
> Other things that can be called "connection concentrators" differ in that
> they are additionally trying to put a band-aid over server design flaws that
> make idle connections hurt scalability.  Or to prevent resource consumption
> issues that the database doesn't have enough control over on its own (again,
> a flaw -- a server should be as resilient to bad client behavior and its
> resource consumption as possible).
>
>
> Most 'modern' server designs throttle active actions internally.  Apache's
> (very old, and truly somewhat 1995-ish) process or thread per connection
> model is being abandoned for event driven models in the next version, so it
> can scale like the higher performing web servers to 20K+ keep-alive
> connections with significantly fewer threads / processes.
>
> SQL is significantly more complicated than HTTP and requires a lot more
> state which dictates a very different design, but nothing about it requires
> idle connections to cause reduced SMP scalability.
>
> In addition to making sure idle connections have almost no impact on
> performance (just eat up some RAM), scalability as active queries increase
> is important.  Although the OS is responsible for a lot of this, there are
> many things that the application can do to help out.  If Postgres had a
> "max_active_connections" parameter for example, then the memory used by
> work_mem would be related to this value and not max_connections.  This would
> further make connection poolers/concentrators less useful from a performance
> and resource management perspective.
>
> Once the above is done, connection pooling, whether integrated or provided
> by a third party, would mostly only have value for clients who cannot pool
> or cache connections on their own.  This is the state of connection pooling
> with most other DB's today.

I think I see the distinction you're drawing here. IIUC, you're
arguing that other database products use connection pooling to handle
rapid connect/disconnect cycles and to throttle the number of
simultaneous queries, but not to cope with the possibility of large
numbers of idle sessions. My limited understanding of why PostgreSQL
has a problem in this area is that it has to do with the size of the
process array which must be scanned to derive an MVCC snapshot. I'd
be curious to know if anyone thinks that's correct, or not.

Assuming for the moment that it's correct, databases that don't use
MVCC won't have this problem, but they give up a significant amount of
scalability in other areas due to increased blocking (in particular,
writers will block readers). So how do other databases that *do* use
MVCC mitigate this problem? The only one that we've discussed here is
Oracle, which seems to get around the problem by having a built-in
connection pooler. That gets me back to thinking that the two issues
are related, unless there's some other technique for dealing with the
need to derive snapshots.

...Robert

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Erik Aronesty 2009-06-04 11:31:44 Re: degenerate performance on one server of 3
Previous Message Marc Cousin 2009-06-04 08:22:14 Re: Scalability in postgres