Re: BUG #16264: Server closed the connection unexpectedly

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robin Duquette <robin(dot)duquette(at)pyxidr(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16264: Server closed the connection unexpectedly
Date: 2020-02-20 00:09:31
Message-ID: 2549.1582157371@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Robin Duquette <robin(dot)duquette(at)pyxidr(dot)com> writes:
> 1. I'm running macOS 10.15.3
> 2. Everything was working fine with PostgreSQL version 12.1 and since
> I have installed 12.2 on two machines (with identical os), many queries
> induce server process to get terminated by signal 9 (according to log).
> This is the case on the two machines that I'm running 12.2 Unfortunately,
> the log doesn't say more than that (see below an extract).

Huh, interesting. Signal 9 (SIGKILL) is an externally-imposed process
termination, rather than an internal failure. We've seen one similar
report recently:

https://www.postgresql.org/message-id/flat/CEF2C288-13E6-4727-81D0-0775F40F313B%40arcict.com

and as mentioned there, the most likely theory is that the backend process
is consuming an unreasonable amount of memory and the SIGKILL is coming
from a system-level out-of-memory defense mechanism. I hadn't thought
that macOS did that, but it looks like I'm finding out differently.

That does not get us a whole lot closer to identifying the cause, though.
It's certainly believable that we introduced some kind of memory leak
between 12.1 and 12.2, but that's not enough info to find it.

First things first though. Can you watch the system with "top" or
Activity Monitor and confirm or disprove that there's a memory
consumption issue before the SIGKILL? We ought to be sure about
that before we go spending a lot of time.

If that does seem to be the case, launching the postmaster under a
restrictive ulimit (maybe "ulimit -v 1000000" or so) could be a
second step. That ought to help reduce the problem from a SIGKILL
to a normal out-of-memory error, which not only would make things
a bit more stable for you, but it should allow the failing query
to dump a memory map to the postmaster's stderr, which would give
us a little more to go on about where the leak is.

In the end, though, I'm afraid we might have to ask you to produce
a reproducible test case of a query that consumes excessive memory.
These things can be very hard to identify without digging into it
with a debugger.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Artur Zakirov 2020-02-20 01:04:54 Re: Full text search bug ('russian' regconfig)
Previous Message Robin Duquette 2020-02-19 23:47:44 Re: BUG #16264: Server closed the connection unexpectedly