Re: BUG #16160: Minor memory leak in case of starting postgres server with SSL encryption

From: Jelte Fennema <postgres(at)jeltef(dot)nl>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: duspensky(at)ya(dot)ru, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16160: Minor memory leak in case of starting postgres server with SSL encryption
Date: 2021-03-16 15:34:17
Message-ID: CAGECzQTucR20CLwb6mw_JLdVST8GqYNqYr1OmAnNS2Byg8r+dg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

We ran into this memory leak on PG11 in production. The lea was determined
to be the root cause of OOM errors we were seeing. There was a combination
of a things that caused this leak to become serious enough for these OOM
errors to happen:

1. Very frequent SIGHUPs (every minute). Which causes this memory leak to
cumulatively leak significant amounts of memory over the course of a few
months (MBs instead of KBs)
2. A semi high number of connections that the workload had open (~150
connections). Each of these connections would start with the cumulative
memory leaked as copy-on-write memory. This multiplied the memory leak to
cause multiple GBs of copy-on-write memory.
3. We run Linux with vm.overcommit_memory=2. This causes copy-on-write
memory that isn't changed to effectively count towards allocated memory.

To clarify the context a bit more if you're not familiar with the details
of vm.overcommit_memory: There's "used" memory and "commited_as" memory.
The copy-on-write memory in all backends is counted towards "commited_as"
memory. "used" memory does not increase for every backend, because it's
copy-on-write and none of the backends write to this memory (since it's
leaked so there's no live pointer to it).

Linux puts a hard limit on commited_as, because we use
vm.overcommit_memory=2 (which means memory overcommitting is disabled). If
we had memory overcommiting enabled, then this memory leak wouldn't be a
real problem. The amount of "used" memory is pretty much negligable. It
only becomes a problem, because it's commited_as is multiplied for every
process and we care about commited_as because of disabled overcommiting.

It would be great if this could be backpatched to all currently supported
PG versions. The patch is very small, so it should be very little effort I
think. I'd be happy to help with that if that's useful or needed.

On Tue, 16 Mar 2021 at 16:09, Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Fri, Dec 13, 2019 at 03:39:15PM +0900, Michael Paquier wrote:
> > Attached is a patch, I'll go commit that if there are no objections.
> > The DH handling does not really change regarding the way it gets
> > free'd or not down to 0.9.8.
>
> And committed. Dmitry has pointed out offline that we need to do the
> same with the error code path, and he is right as OpenSSL does not
> touch the passed-in DH information for 0.9.8~.
> --
> Michael
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Henry Hinze 2021-03-16 15:43:37 Re: BUG #16924: Backup and Restore fails for Generated Columns in Declarative Partitioning
Previous Message Michael Paquier 2021-03-16 08:40:17 Re: BUG #16927: Postgres can`t access WAL files