Memory leak (possibly connected to postgis) leading to server crash

From: Roman Cervenak <roman(at)cervenak(dot)info>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Memory leak (possibly connected to postgis) leading to server crash
Date: 2019-12-06 09:22:42
Message-ID: CAGjExY1C40d=cqbf6nfeNDCLWk57MhFu3mgR0+KRPb8+_uW_KA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello,
this may be a serious issue and I am not very experienced with reporting
this kind of stuff, I will try my best. I believe there may be a memory
leak somewhere in postgres/postgis, which can eat up all the available
memory, leading to postgres crash (signal 9 kill of backend, postmaster
terminating others, recovery).

My setup and workload:
Postgres 12.1 (Ubuntu 12.1-1.pgdg18.04+1) installed via apt on Ubuntu
server 18.04 LTS, in VM in Azure cloud.
PostGIS version: 3.0 USE_GEOS=1 USE_PROJ=1 USE_STATS=1
VM has 8cores, 56 GB RAM, 7 TB RAID from managed disks (azure network
storage)

shared_buffers = 12GB
work_mem = 256MB
maintenance_work_mem = 2GB

I was running 8 clients (on different VMs, over network) using the
database, batch processing geographic stuff. Each worker is running one
query at a time, so pretty much 8 parallel queries 100% of the time.
Queries themselves are fairly short (5-60 seconds), SELECTing rows by
geometry index (GIST) and doing stuff like ST_SimplifyPreserveTopology,
ST_Buffer, ST_Union on them, essentially using all 8 cores to 100%. There
is a lot of things to process, so this was going on for maybe 12 hours,
when I noticed (in "htop") that memory usage is unusually high, somewhere
about 50 GB. It was suspicious, because it is way higher than it should be
with 12 GB shared buffers and 256MB work_mem with 8 clients, and it
continued to grow slowly (I could see numbers in RES column in htop slowly
rise for all backends). Until eventually:

2019-12-06 00:04:24 UTC [21487-8] LOG: server process (PID 52059) was
terminated by signal 9: Killed
2019-12-06 00:04:24 UTC [21487-10] LOG: terminating any other active
server processes
2019-12-06 00:04:24 UTC [91091-1] WARNING: terminating connection because
of crash of another server process
2019-12-06 00:04:24 UTC [91091-2] DETAIL: The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
...
FATAL: the database system is in recovery mode
...

I realize, this is probably not enough for you. But I will be running this
workload again soon, so if this repeats, what should I do to help you
diagnose this?

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2019-12-06 10:33:48 BUG #16156: bugs security
Previous Message Etsuro Fujita 2019-12-06 08:27:31 Re: BUG #16139: Assertion fails on INSERT into a postgres_fdw' table with two AFTER INSERT triggers