Re: One process per session lack of sharing

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, AMatveev(at)bitec(dot)ru, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: One process per session lack of sharing
Date: 2016-07-14 05:34:31
Message-ID: CAMsr+YH9tGpknN5H-T=9=vK8yP-M0jTrB_a5BQj6GeY2K2pTaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 14 July 2016 at 03:59, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> I agree that there's not really a plan to implement this, but I don't
> agree that connection pooling solves the whole problem. Most people
> can't get by with statement pooling, so in practice you are looking at
> transaction pooling or session pooling. And that means that you can't
> really keep the pool size as small as you'd like because backends can
> be idle in transaction for long enough to force the pool size to be
> pretty large. Also, pooling causes the same backends to get reused
> for different sessions which touch different relations and different
> functions so that, for example, the relcache and the PL/pgsql function
> caches grow until every one of those sessions has everything cached
> that any client needs. That can cause big problems.
>
> So, I actually think it would be a good idea to think about this.

I agree. It's been on my mind for a while, but I've been assuming it's
likely to involve such architectural upheaval as to be impractical.

Right now PostgreSQL conflates "user session state" and "execution engine"
into one process. This means we need an external connection pooler to
handle things if we want more user connections than we can efficiently
handle in terms of number of executors. Current poolers don't do much to
keep track of user state, they just arbitrate access to executors and
expect applications to re-establish any needed state (SET vars, LISTEN,
etc) or not use features that require persistence across the current
pooling level.

This leaves users in the hard position of using very high, inefficient
max_connections values to keep track of application<->DB state or jump
through awkward hoops to use transaction pooling, either at the application
level (Java appserver pools, etc) or through a proxy.

If using the high max_connections approach the user must also ensure that
they don't have all those max_connections actually doing work at the same
time using some kind of external coordination. Otherwise they'll thrash the
server and face out of memory issues (especially with our rather simplistic
work_mem management, etc) and poor performance.

The solution, to me, is to separate "user state" and "executor". Sounds
nice, but we use global variables _everywhere_ and it's assumed throughout
the code that we have one user session for the life of a backend, though
with some exceptions for SET SESSION AUTHORIZATION. It's not likely to be
fun.

The
> problem, of course, is that as long as we allow arbitrary parts of the
> code - including extension code - to declare global variables and
> store arbitrary stuff in them without any coordination, it's
> impossible to imagine hibernating and resuming a session without a
> risk of things going severely awry.

Yeah. We'd definitely need a session state management mechanism with save
and restore functionality.

There's also stuff like:

* LISTEN
* advistory locking at the session level
* WITH HOLD cursors

that isn't simple to just save and restore. Those are some of the same
things that are painful with transaction pooling right now.

> This was a major issue for
> parallel query, but we've solved it, mostly, by designating the things
> that rely on global variables as parallel-restricted, and there
> actually aren't a ton of those. So I think it's imaginable that we
> can get to a point where we can, at least in some circumstances, let a
> backend exit and reconstitute its state at a later time. It's not an
> easy project, but I think it is one we will eventually need to do.
>

I agree on both points, but I think "not easy" is rather an understatement.

Starting with a narrow scope would help. Save/restore GUCs and the other
easy stuff, and disallow sessions that are actively LISTENing, hold
advisory locks, have open cursors, etc from being saved and restored.

BTW, I think this would also give us a useful path toward allowing
connection poolers to change the active user and re-authenticate on an
existing backend. Right now you have to use SET ROLE or SET SESSION
AUTHORIZATION (ugh) and can't stop the client you hand the connection to
from just RESETing back to the pooler's user and doing whatever it wants.

> Insisting that the current model is working is just sticking our head
> in the sand. It's mostly working, but there are workloads where it
> fails badly - and competing database products survive a number of
> scenarios where we just fall on our face.
> <http://www.postgresql.org/mailpref/pgsql-hackers>
>

Yep, and like parallel query it's a long path, but it's one we've got to
face sooner or later.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2016-07-14 05:48:37 Re: One process per session lack of sharing
Previous Message Masahiko Sawada 2016-07-14 05:22:17 Fix typo in postgres_fdw/deparse.c