From: | Don Seiler <don(at)seiler(dot)us> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | Pgsql Performance <pgsql-performance(at)lists(dot)postgresql(dot)org> |
Subject: | Re: High COMMIT times |
Date: | 2021-01-11 00:42:24 |
Message-ID: | CAHJZqBCR6ermxmbJv2XBW73aSwU3mtknhwvKC8xgOAnYBKAyzA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Sat, Jan 9, 2021 at 2:07 PM Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
> How are you monitoring the COMMIT times? What do you generally see in
> pg_stat_activity.wait_event during the spikes/stalls?
>
Right now we just observe the COMMIT duration posted in the postgresql log
(we log anything over 100ms).
One other thing that I shamefully forgot to mention. When we see these slow
COMMITs in the log, they coincide with a connection storm (Cat 5 hurricane)
from our apps where connections will go from ~200 to ~1200. This will
probably disgust many, but our PG server's max_connections is set to 2000.
We have a set of pgbouncers in front of this with a total
max_db_connections of 1600. I know many of you think this defeats the whole
purpose of having pgbouncer and I agree. I've been trying to explain as
much and that even with 32 CPUs on this DB host, we probably shouldn't
expect to be able to support more than 100-200 active connections, let
alone 1600. I'm still pushing to have our app server instances (which also
use their own JDBC (Hikari) connection pool and *then* go through
pgbouncer) to lower their min/max connection settings but obviously it's
sort of counterintuitive at first glance but hopefully everyone sees the
bigger picture.
One nagging question I have is if the slow COMMIT is triggering the
connection storm (eg app sees slow response or timeout from a current
connection and fires off a new connection in its place), or vice-versa.
We're planning to deploy new performant cloud storage (Azure Ultra disk)
just for WAL logs but I'm hesitant to say it'll be a silver bullet when we
still have this insane connection management strategy in place.
Curious to know what others think (please pull no punches) and if others
have been in a similar scenario with anecdotes to share.
Thanks,
Don.
--
Don Seiler
www.seiler.us
From | Date | Subject | |
---|---|---|---|
Next Message | Craig Jackson | 2021-01-11 15:05:24 | Re: High COMMIT times |
Previous Message | Pavel Stehule | 2021-01-10 16:54:48 | Re: proposal: schema variables |