Re: Are many idle connections bad?

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Craig James <cjames(at)emolecules(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Are many idle connections bad?
Date: 2015-07-26 00:43:06
Message-ID: CAMkU=1yqM4xay2RB8nqUDKVot00utWfDLfg35CNBbFdCdz9r0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sat, Jul 25, 2015 at 7:50 AM, Craig James <cjames(at)emolecules(dot)com> wrote:

> The canonical advice here is to avoid more connections than you have CPUs,
> and to use something like pg_pooler to achieve that under heavy load.
>
> We are considering using the Apache mod_perl "fast-CGI" system and perl's
> Apache::DBI module, which caches persistent connections in order to improve
> performance for lightweight web requests. Due to the way our customers are
> organized (a separate schema per client company),
>

And presumably with a different PostgreSQL user to go with each schema?

> it's possible that there would be (for example) 32 fast-CGI processes,
> each of which had hundreds of cached connections open at any given time.
> This would result in a thousand or so Postgres connections on a machine
> with 32 CPUs.
>

Why would it need so many cached connections per fast-CGI process? Could
you set up affinity so that the same client (or at least the same web
session) usually ends up at the same fast-CGI process (when it is
available), so the other fast-CGI processes don't need to cache DBI
connections for every DB user, but just for the ones they habitually serve?

>
> But, Apache's fast-CGI mechanism allows you to specify the maximum number
> of fast-CGI processes that can run at one time; requests are queue by the
> Apache server if the load exceeds this maximum. That means that there would
> never be more than a configured maximum number of active connections; the
> rest would be idle.
>
> So we'd have a situation where there there could be thousands of
> connections, but the actual workload would be throttled to any limit we
> like. We'd almost certainly limit it to match the number of CPUs.
>
> So the question is: do idle connections impact performance?
>

In my hands, truly idle connections are very very cheap, other than the
general overhead of a having a process in the process table and some local
memory. Where people usually run into trouble are:

1) that the idle connections are only idle "normally", and as soon as the
system runs into trouble the app starts trying to use all of those
usually-idle connections. So you get increased use at the exact moment
when you can't deal with it--when the system is already under stress. It
sounds like you have that base covered.

2) That the idle connections are "idle in transaction", not truly idle, and
this causes a variety of troubles, like vacuum not working effectively and
hint bits that are permanently unsettable.

2b) A special case of 2 is that transaction has inserted a bunch of
uncommitted tuples and then gone idle (or is just doing some other time
consuming things) before either committing them or rolling them back. This
can create an enormous amount of contention the proclock, as every process
which stumbles across the tuple then has to ask every other active process
"Is this your tuple? Are you done with it?". This could be particularly
problematic if for example you are bulk loading a vendor catalog in a
single transaction and therefore have a bunch of uncommitted tuples that
are hanging around for along time.

If you have reasonably good load generator, it is pretty easy to spin up a
bunch of idle connections and see what happens on your own hardware with
your own workload and your own version of PostgreSQL.

Cheers,

Jeff

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Priyank Tiwari 2015-07-28 07:52:16 Any ideas how can I speed up this query?
Previous Message Craig James 2015-07-25 16:06:53 Re: Are many idle connections bad?