Re: Built-in connection pooling

From: Ryan Pedela <rpedela(at)datalanche(dot)com>
To: Shay Rojansky <roji(at)roji(dot)org>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Built-in connection pooling
Date: 2018-02-09 23:41:31
Message-ID: CACu89FSasoBzL7vaCq7nxDUB=DDzoXfBy_AXq6DgTojhbAsuzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 9, 2018 at 4:14 PM, Shay Rojansky <roji(at)roji(dot)org> wrote:

> Am a bit late to this thread, sorry if I'm slightly rehashing things. I'd
> like to go back to the basic on this.
>
> Unless I'm mistaken, at least in the Java and .NET world, clients are
> almost always expected to have their own connection pooling, either
> implemented inside the driver (ADO.NET model) or as a separate modular
> component (JDBC). This approach has a few performance advantages:
>
> 1. "Opening" a new pooled connection is virtually free - no TCP connection
> needs to be opened, no I/O, no startup packet, nothing (only a tiny bit of
> synchronization).
> 2. Important client state can be associated to physical connections. For
> example, prepared statements can be tracked on the physical connection, and
> persisted when the connection is returned to the pool. The next time the
> physical connection is returned from the pool, if the user tries to
> server-prepare a statement, we can check on the connection if it has
> already been prepared in a "previous lifetime", and if so, no need to
> prepare again. This is vital for scenarios with short-lived (pooled)
> connections, such as web. Npgsql does this.
>
> Regarding the problem of idle connections being kept open by clients, I'd
> argue it's a client-side problem. If the client is using a connection pool,
> the pool should be configurable to close idle connections after a certain
> time (I think this is relatively standard behavior). If the client isn't
> using a pool, it seems to be the application's responsibility to release
> connections when they're no longer needed.
>
> The one drawback is that the pooling is application-specific, so it can't
> be shared by multiple applications/hosts. So in some scenarios it may make
> sense to use both client pooling and proxy/server pooling.
>
> To sum it up, I would argue that connection pooling should first and
> foremost be considered as a client feature, rather than a proxy feature
> (pgpool) or server feature (the PostgreSQL pooling being discussed here).
> This isn't to say server-side pooling has no value though.
>

Recently, I did a large amount of parallel data processing where the
results were stored in PG. I had about 1000 workers each with their own PG
connection. As you pointed out, application pooling doesn't make sense in
this scenario. I tried pgpool and pgbouncer, and both ended up as the
bottleneck. Overall throughput was not great but it was highest without a
pooler. That aligns with Konstantin's benchmarks too. As far as I know,
server pooling is the only solution to increase throughput, without
upgrading hardware, for this use case.

I hope this PR gets accepted!

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-02-10 00:07:53 Re: Disabling src/test/[ssl|ldap] when not building with SSL/LDAP support
Previous Message Shay Rojansky 2018-02-09 23:14:15 Re: Built-in connection pooling