Re: Built-in connection pooling

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Built-in connection pooling
Date: 2018-01-19 16:59:11
Message-ID: 7bc1e831-54e6-b2db-f083-2d26e9ae0dd7@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/19/2018 05:17 PM, Konstantin Knizhnik wrote:
>
>
> On 19.01.2018 18:53, Tomas Vondra wrote:
>>
>> ...
>>
>> The questions I'm asking myself are mostly these:
>>
>> 1) When assigning a backend, we first try to get one from a pool, which
>> happens right at the beginning of BackendStartup. If we find a usable
>> backend, we send the info to the backend (pg_send_sock/pg_recv_sock).
>>
>> But AFAICS this only only happens at connection time, right? But it your
>> initial message you say "Rescheduling is done at transaction level,"
>> which in my understanding means "transaction pooling". So, how does that
>> part work?
>
> Here it is:
>
>               ChooseSession:
> ...
>

OK, thanks.

>>
>> 2) How does this deal with backends for different databases? I
>> don't see any checks that the requested database matches the
>> backend database (not any code switching the backend from one db to
>> another - which would be very tricky, I think).
> As I wrote in the initial mail this problem is not handled now.
> It is expected that all clients are connected to the same database using
> the same user.
> I only check and report an error if this assumption is violated.
> Definitely it should be fixed. And it is one of the main challenge with
> this approach! And I want to receive some advices from community about
> the best ways of solving it.
> The problem is that we get information about database/user in
> ProcessStartupPackage function in the beackend, when session is already
> assigned to the particular backend.
> We either have to somehow redirect session to some other backend
> (somehow notify postmaster that we are not able to handle it)?
> either obtain database/user name in postmaster. But it meas that
> ProcessStartupPackage should be called in postmaster and Postmaster has
> to read from client's socket.
> I afraid that postmaster can be a bottleneck in this case.
>

Hmmm, that's unfortunate. I guess you'll have process the startup packet
in the main process, before it gets forked. At least partially.

> The problem can be much easily solved in case of using pthread version
> of Postgres. In this case reassigning session to another executor
> (thread) can be don much easily.
> And there is no need to use unportable trick with passing fiel
> descriptor to other process.
> And in future I am going to combine them. The problem is that pthread
> version of Postgres is still in very raw state.
>

Yeah. Unfortunately, we're using processes now, and switching to threads
will take time (assuming it happens at all).

>> 3) Is there any sort of shrinking the pools? I mean, if the backend is
>> idle for certain period of time (or when we need backends for other
>> databases), does it get closed automatically?
>
> When client is disconnected, client session is closed. But backen is not
> terminated even if there are no more sessions at this backend.
> It  was done intentionally, to avoid permanent spawning of new processes
> when there is one or few clients which frequently connect/disconnect to
> the database.

Sure, but it means a short peak will exhaust the backends indefinitely.
That's acceptable for a PoC, but I think needs to be fixed eventually.

>>
>> Furthermore, I'm rather confused about the meaning of session_pool_size.
>> I mean, that GUC determines the number of backends in the pool, it has
>> nothing to do with sessions per se, right? Which would mean it's a bit
>> misleading to name it "session_..." (particularly if the pooling happens
>> at transaction level, not session level - which is question #1).
>>
> Yehh, yes it is not right name. It means maximal number of backends
> which should be used to serve client's sessions.
> But "max backends" is already used and has completely different meaning.
>
>> When I've been thinking about adding a built-in connection pool, my
>> rough plan was mostly "bgworker doing something like pgbouncer" (that
>> is, listening on a separate port and proxying everything to regular
>> backends). Obviously, that has pros and cons, and probably would not
>> work serve the threading use case well.
>
> And we will get the same problem as with pgbouncer: one process will not
> be able to handle all connections...
> Certainly it is possible to start several such scheduling bgworkers...
> But in any case it is more efficient to multiplex session in backend
> themselves.
>

Well, I haven't said it has to be single-threaded like pgbouncer. I
don't see why the bgworker could not use multiple threads internally (of
course, it'd need to be not to mess the stuff that is not thread-safe).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-01-19 16:59:59 Re: pgsql: Local partitioned indexes
Previous Message Konstantin Knizhnik 2018-01-19 16:53:45 Re: Built-in connection pooling