Re: Built-in connection pooling

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Built-in connection pooling
Date: 2018-04-23 20:14:45
Message-ID: CA+TgmoZEhwHW1aJYE-MMUrT8yBshQODfKiTQTQbhv87fT5gxYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 18, 2018 at 9:41 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> Well, may be I missed something, but i do not know how to efficiently
>> support
>> 1. Temporary tables
>> 2. Prepared statements
>> 3. Sessoin GUCs
>> with any external connection pooler (with pooling level other than
>> session).
>
> Me neither. What makes it easier to do these things in an internal
> connection pooler? What could the backend do differently, to make these
> easier to implement in an external pooler?

I think you are Konstantin are possibly failing to see the big picture
here. Temporary tables, prepared statements, and GUC settings are
examples of session state that users expect will be preserved for the
lifetime of a connection and not beyond; all session state, of
whatever kind, has the same set of problems. A transparent connection
pooling experience means guaranteeing that no such state vanishes
before the user ends the current session, and also that no such state
established by some other session becomes visible in the current
session. And we really need to account for *all* such state, not just
really big things like temporary tables and prepared statements and
GUCs but also much subtler things such as the state of the PRNG
established by srandom().

This is really very similar to the problem that parallel query has
when spinning up new worker backends. As far as possible, we want the
worker backends to have the same state as the original backend.
However, there's no systematic way of being sure that every relevant
backend-private global, including perhaps globals added by loadable
modules, is in exactly the same state. For parallel query, we solved
that problem by copying a bunch of things that we knew were
commonly-used (cf. parallel.c) and by requiring functions to be
labeled as parallel-restricted if they rely on anything other state.
The problem for connection pooling is much harder. If you only ever
ran parallel-safe functions throughout the lifetime of a session, then
you would know that the session has no "hidden state" other than what
parallel.c already knows about (except for any functions that are
mislabeled, but we can say that's the user's fault for mislabeling
them). But as soon as you run even one parallel-restricted or
parallel-unsafe function, there might be a global variable someplace
that holds arbitrary state which the core system won't know anything
about. If you want to have some other process take over that session,
you need to copy that state to the new process; if you want to reuse
the current process for a new session, you need to clear that state.
Since you don't know it exists or where to find it, and since the code
to copy and/or clear it might not even exist, you can't.

In other words, transparent connection pooling is going to require
some new mechanism, which third-party code will have to know about,
for tracking every last bit of session state that might need to be
preserved or cleared. That's going to be a big project. Maybe some
of that can piggyback on existing infrastructure like
InvalidateSystemCaches(), but there's probably still a ton of ad-hoc
state to deal with. And no out-of-core pooler has a chance of
handling all that stuff correctly; an in-core pooler will be able to
do so only with a lot of work.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-04-23 20:14:48 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Previous Message Tom Lane 2018-04-23 19:33:17 Re: Boolean partitions syntax