Re: connections not getting closed on a replica

From: "FarjadFarid\(ChkNet\)" <farjad(dot)farid(at)checknetworks(dot)com>
To: "'Kevin Grittner'" <kgrittn(at)gmail(dot)com>, "'Carlo Cabanilla'" <carlo(at)datadoghq(dot)com>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: connections not getting closed on a replica
Date: 2015-12-12 19:25:14
Message-ID: 00a801d13512$d39bc6a0$7ad353e0$@checknetworks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Assuming you have at least 16GB of memory. These numbers on a good hardware server is not a real problem. On a bad server motherboard. Might as well use a standard PC. With 32GB I have tested 10 times more connections. Not to postgresql.

I would investigate everything from bottom up.

Also under Tcp/Ip the flow and validity of the transaction is guaranteed. So I would look for other issues that is locking system.

For a good motherboard design check out Intel's motherboards.

-----Original Message-----
From: pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-owner(at)postgresql(dot)org] On Behalf Of Kevin Grittner
Sent: 11 December 2015 22:13
To: Carlo Cabanilla
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: [GENERAL] connections not getting closed on a replica

On Fri, Dec 11, 2015 at 3:37 PM, Carlo Cabanilla <carlo(at)datadoghq(dot)com> wrote:

> 16 cores

> a default pool size of 650, steady state of 500-600 server connections

With so many more connections than resources to serve them, one thing that can happen is that just by happen-stance enough processes become busy at one time that they start context switching a lot before they finish, leaving spinlocks blocked and causing other contention that slows all query run times. This causes bloat to increase because some database transactions are left active for longer times. If the client software and/or pooler don't queue requests at that point there will be more connections made because connections have not been freed because of the contention causing slowness -- which exacerbates that problem and leads to a downward spiral. That can become so bad that there is no recovery until either the clients software is stopped or the database is restarted.

>> I don't suppose you have vmstat 1 output from the incident? If it
>> happens again, try to capture that.
>
> Are you looking for a stat in particular?

Not really; what I like about `vmstat 1` is how many useful pieces of information are on each line, allowing me to get a good overview of what's going on. For example, if system CPU time is high, it is very likely to be a problem with transparent huge pages, which is one thing that can cause these symptoms. A "write glut" can also do so, which can be controlled by adjusting checkpoint and background writer settings, plus the OS vm.dirty_* settings (and maybe keeping shared_buffers smaller than you otherwise might).
NUMA problems are not at issue, since there is only one memory node.

Without more evidence of what is causing the problem, suggestions for a solution are shots in the dark.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Oleg Bartunov 2015-12-13 06:07:20 Re: json indexing and data types
Previous Message Shay Cohavi 2015-12-12 19:08:20 postgresql 9.3 failover time