Re: Built-in connection pooler

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Ryan Lambert <ryan(at)rustprooflabs(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Dimitri Fontaine <dim(at)tapoueh(dot)org>
Subject: Re: Built-in connection pooler
Date: 2019-07-25 12:00:30
Message-ID: 42c78b0e-e2ad-7920-3b31-37a81c9ba83b@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Ryan,

Thank you very much for review and benchmarking.
My answers are inside.

On 25.07.2019 0:58, Ryan Lambert wrote:
>
> Applying the patch [1] has improved from v9, still getting these:
>
Fixed.

>  used a DigitalOcean droplet with 2 CPU and 2 GB RAM and SSD for this
> testing, Ubuntu 18.04.  I chose the smaller server size based on the
> availability of similar and recent results around connection pooling
> [2] that used AWS EC2 m4.large instance (2 cores, 8 GB RAM) and
> pgbouncer.  Your prior pgbench tests [3] also focused on larger
> servers so I wanted to see how this works on smaller hardware.
>
> Considering this from connpool.sgml:
> "<varname>connection_proxies</varname> specifies number of connection
> proxy processes which will be spawned. Default value is zero, so
> connection pooling is disabled by default."
>
> That hints to me that connection_proxies is the main configuration to
> start with so that was the only configuration I changed from the
> default for this feature.  I adjusted shared_buffers to 500MB (25% of
> total) and max_connections to 1000.  Only having one proxy gives
> subpar performance across the board, so did setting this value to 10. 
> My hunch is this value should roughly follow the # of cpus available,
> but that's just a hunch.
>

I do not think that number of proxies should depend on number of CPUs.
Proxy process is not performing any computations, it is just redirecting
data from client to backend and visa versa.
Certainly starting  form some number of connections is becomes
bottleneck. The same is true for pgbouncer: you need to start several
pgbouncer instances to be able to utilize all resources and provide best
performance at
computer with large number of cores. The optimal value greatly depends
on number on workload, it is difficult to suggest some formula which
allows to calculate optimal number of proxies for each configuration.
> I don't understand yet how max_sessions ties in.
> Also, having both session_pool_size and connection_proxies seemed
> confusing at first.  I still haven't figured out exactly how they
> relate together in the overall operation and their impact on performance.

"max_sessions" is mostly technical parameter. To listen client
connections I need to initialize WaitEvent set specdify maximal number
of events.
It should not somehow affect performance. So just  specifying large
enough value should work in most cases.
But I do not want to hardcode some constants and that it why I add GUC
variable.

"connections_proxies" is used mostly to toggle connection pooling.
Using more than 1 proxy is be needed only for huge workloads (hundreds
connections).

And "session_pool_size" is core parameter  which determine efficiency of
pooling.
The main trouble with it now, is that it is per database/user
combination. Each such combination will have its own connection pool.
Choosing optimal value of pooler backends is non-trivial task. It
certainly depends on number of available CPU cores.
But if backends and mostly disk-bounded, then optimal number of pooler
worker can be large than number of cores.
Presence of several pools make this choice even more complicated.

> The new view helped, I get the concept of **what** it is doing
> (connection_proxies = more rows, session_pool_size = n_backends for
> each row), it's more a lack of understanding the **why** regarding how
> it will operate.
>
>
> postgres=# select * from pg_pooler_state();
>  pid  | n_clients | n_ssl_clients | n_pools | n_backends |
> n_dedicated_backends | tx_bytes  | rx_bytes  | n_transactions
> ------+-----------+---------------+---------+------------+----------------------+-----------+-----------+----------------
>  1682 |        75 |             0 |       1 |         10 |            
>  0 | 366810458 | 353181393 |        5557109
>  1683 |        75 |             0 |       1 |         10 |            
>  0 | 368464689 | 354778709 |        5582174
> (2 rows
>
>
>
> I am not sure how I feel about this:
> "Non-tainted backends are not terminated even if there are no more
> connected sessions."
PgPRO EE version of connection pooler has "idle_pool_worker_timeout"
parameter which allows to terminate idle workers.
It is possible to implement it also for vanilla version of pooler. But
primary intention of this patch was to minimize changes in Postgres core

>
> Would it be possible (eventually) to monitor connection rates and free
> up non-tainted backends after a time?  The way I'd like to think of
> that working would be:
>
> If 50% of backends are unused for more than 1 hour, release 10% of
> established backends.

>
> The two percentages and time frame would ideally be configurable, but
> setup in a way that it doesn't let go of connections too quickly,
> causing unnecessary expense of re-establishing those connections.  My
> thought is if there's one big surge of connections followed by a long
> period of lower connections, does it make sense to keep those extra
> backends established?
>

I think that idle timeout is enough but more complicated logic can also
be implemented.

>
> I'll give the documentation another pass soon.  Thanks for all your
> work on this, I like what I'm seeing so far!
>
Thank you very much.
I attached new version of the patch with fixed indentation problems and
Win32 specific fixes.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
builtin_connection_proxy-11.patch text/x-patch 130.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message pantilimonov michael 2019-07-25 12:34:08 Re: [GSoC] artbufmgr
Previous Message vignesh C 2019-07-25 11:50:24 Re: Initdb failure