Re: Optimizing Database High CPU

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Michael Lewis <mlewis(at)entrata(dot)com>
Cc: Scottix <scottix(at)gmail(dot)com>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Optimizing Database High CPU
Date: 2019-03-04 20:03:33
Message-ID: CAMkU=1yTGLe5Frc6QC0JK-3-vO8XYWHhP4avGoUQ5TUGr=Zvqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Feb 27, 2019 at 5:01 PM Michael Lewis <mlewis(at)entrata(dot)com> wrote:

> If those 50-100 connections are all active at once, yes, that is high.
>> They can easily spend more time fighting each other over LWLocks,
>> spinlocks, or cachelines rather than doing useful work. This can be
>> exacerbated when you have multiple sockets rather than all cores in a
>> single socket. And these problems are likely to present as high Sys times.
>>
>> Perhaps you can put up a connection pooler which will allow 100
>> connections to all think they are connected at once, but forces only 12 or
>> so to actually be active at one time, making the others transparently queue.
>>
>
> Can you expound on this or refer me to someplace to read up on this?
>

Just based on my own experimentation. This is not a blanket
recommendation, but specific to the situation that we already suspect
there is contention, and the server is too old to have
pg_stat_actvity.wait_event
column.

> Context, I don't want to thread jack though: I think I am seeing similar
> behavior in our environment at times with queries that normally take
> seconds taking 5+ minutes at times of high load. I see many queries showing
> buffer_mapping as the LwLock type in snapshots but don't know if that may
> be expected.
>

It sounds like your processes are fighting to reserve buffers in
shared_buffers in which to read data pages. But those data pages are
probably already in the OS page cache, otherwise reading it from disk would
be slow enough that you would be seeing some type of IO wait, or buffer_io,
rather than buffer_mapping as the dominant wait type. So I think that
means you have most of your data in RAM, but not enough of it in
shared_buffers. You might be in a rare situation where setting
shared_buffers to a high fraction of RAM, rather than the usual low
fraction, is called for. Increasing NUM_BUFFER_PARTITIONS might also be
useful, but that requires a recompilation of the server. But do these
spikes correlate with anything known at the application level? A change in
the mix of queries, or a long report or maintenance operation? Maybe the
query plans briefly toggle over to using seq scans rather than index scans
or vice versa, which drastically changes the block access patterns?

> In our environment PgBouncer will accept several hundred connections and
> allow up to 100 at a time to be active on the database which are VMs with
> ~16 CPUs allocated (some more, some less, multi-tenant and manually
> sharded). It sounds like you are advocating for connection max very close
> to the number of cores. I'd like to better understand the pros/cons of that
> decision.
>

There are good reasons to allow more than that. For example, your
application holds some transactions open briefly while it does some
cogitation on the application-side, rather than immediately committing and
so returning the connection to the connection pool. Or your server has a
very high IO capacity and benefits from lots of read requests in the queue
at the same time, so it can keep every spindle busy and every rotation
productive. But, if you have no reason to believe that any of those
situations apply to you, but do have evidence that you have lock contention
between processes, then I think that limiting the number active processes
to the number of cores is a good starting point.

Cheers,

Jeff

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Munro 2019-03-04 20:54:23 Re: Performance comparison between Pgsql 10.5 and Pgsql 11.2
Previous Message Justin Pryzby 2019-03-04 18:31:50 Re: query logging of prepared statements