Re: Built-in connection pooling

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Bruce Momjian <bruce(at)momjian(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Built-in connection pooling
Date: 2018-01-29 08:57:36
Message-ID: a1fdf4eb-4ee0-b35e-34e9-36fadfcbceff@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 28.01.2018 03:40, Bruce Momjian wrote:
> On Mon, Jan 22, 2018 at 06:51:08PM +0100, Tomas Vondra wrote:
>>> Yes, external connection pooling is more flexible. It allows to
>>> perform pooling either at client side either at server side (or even
>>> combine two approaches).>
>>> Also external connection pooling for PostgreSQL is not limited by
>>> pgbouncer/pgpool.>
>>> There are many frameworks maintaining their own connection pool, for
>>> example J2EE, jboss, hibernate,...>
>>> I have a filling than about 70% of enterprise systems working with
>>> databases are written in Java and doing connection pooling in their
>>> own way.>
>> True, but that does not really mean we don't need "our" connection
>> pooling (built-in or not). The connection pools are usually built into
>> the application servers, so each application server has their own
>> independent pool. With larger deployments (a couple of application
>> servers) that quickly causes problems with max_connections.
> I found this thread and the pthread thread very interesting. Konstantin,
> thank you for writing prototypes and giving us very useful benchmarks
> for ideas I thought I might never see.
>
> As much as I would like to move forward with coding, I would like to
> back up and understand where we need to go with these ideas.
>
> First, it looks like pthreads and a builtin pooler help mostly with
> 1000+ connections. It seems like you found that pthreads wasn't
> sufficient and the builtin pooler was better. Is that correct?

Brief answer is yes.
Pthreads allows to minimize per-connection overhead and make it possible
to obtain better results for large number of connections.
But there is a principle problem: Postgres connection is "heave weight" 
object: each connection maintains it own private cache of catalog,
relations, temporary
table pages, prepared statements,... So even through pthreads allows to
minimize per-connection memory usage, it is negligible comparing with
all this connection
private memory resources.  It means that we still need to use connection
pooling.

Pthreads provides two main advantages:
1. Simplify interaction between different workers: on need to use shared
memory with it's fixed size limitation and
impossibility to use normal pointer for dynamic shared memory. Also no
need to implement specialized memory allocator for shared memory.
It makes implementation of parallel query execution and built-on
connection pooling much easier.
2. Optimize virtual-to-physical address translation. There is no need to
maintain separate address space for each backend, so TLB(translation
lookaside buffercan) becomes more efficient.

So it is not completely correct to consider session pooling as
alternative to pthreads.
Ideally this two approaches should be combined.

>
> Is there anything we can do differently about allowing long-idle
> connections to reduce their resource usage, e.g. free their caches?
> Remove from PGPROC? Could we do it conditionally, e.g. only sessions
> that don't have open transactions or cursors?
I think that the best approach is to switch to global (shared) caches
for execution plans, catalog,...
Most of the time this metadata caches are used to be identical for all
clients. So it is just waste of memory and time to maintain them
separately in each backend.
Certainly shared cached requires some synchronization when can be a
point of contention and cause significant degrade of performance.
But taking in account that metadata is updated much rarely than data, I
hope using copy-on-write and atomic operations can help to solve this
problems.
And in can give a lot of different advantages. For example it will be
possible to spend more time in optimizer for detecting optimal execution
plan and store manually plans for
future use.

> It feels like user and db mismatches are always going to cause pooling
> problems. Could we actually exit and restart connections that have
> default session state?

Well, combining multiuser access and connection pooling is really a
challenged problem.
I do not know the best solution for it now. It will be much simpler to
find solution with pthreads model...

Most of enterprise systems are using pgbouncer or similar connection
pooler. In pgbouncer in statement/transaction pooling mode access to the
database is performed under the same user. So it means that many existed
statements are built in the assumption that database is accessed in this
manner.

Concerning "default session state": one of the main drawbacks of
pgbouncer and other external poolers is that them do not allow to use
prepared statements.
And it leads to up to two times performance penalty on typical OLTP
queries. One of the main ideads of built-on session pooling was to
eliminate such limitation.

> Right now, if you hit max_connections, we start rejecting new
> connections. Would it make sense to allow an option to exit idle
> connections when this happens so new users can connect?
It will require changes in client applications, will not it? Them should
be ready that connection can be dropped by server at any moment of time.
I do not know it is possible to drop idle connection and hide this fact
from the client. In my implementation each session keeps minimal
necessary information requires for interaction with client (session
context).  It includes socket, struct Port and session memory context
which should be used instead of TopMemoryContext for session specific data.

>
> I know we have relied on external connection poolers to solve all the
> high connection problems but it seems there might be simple things we
> can do to improve matters. FYI, I did write a blog entry comparing
> external and internal connection poolers:
>
> https://momjian.us/main/blogs/pgblog/2017.html#April_21_2017
>

I completely agree with your arguments in this post.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2018-01-29 08:58:59 Re: PATCH: pgbench - option to build using ppoll() for larger connection counts
Previous Message Kyotaro HORIGUCHI 2018-01-29 08:50:08 Re: [HACKERS] proposal - Default namespaces for XPath expressions (PostgreSQL 11)