Re: terminate called after throwing an instance of 'std::bad_alloc'

From: Andres Freund <andres(at)anarazel(dot)de>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: terminate called after throwing an instance of 'std::bad_alloc'
Date: 2020-10-03 23:01:49
Message-ID: 20201003230149.mtd7fjsjwgii3jv7@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-10-03 17:02:36 -0500, Justin Pryzby wrote:
> We didn't use JIT with v12, but I've been setting jit=on along with v13
> upgrades, so I was trying to convince myself that there's a change in v13, and
> not just our config..
>
> I've seen repeated OOMs on a small number of reports. It looks like JIT
> optimization/inlining used a good deal of RAM in v12, and even more on v13.
> Maybe that's what's expected; it looks like this scales with the number of
> partitions and number of functions, so it could be arbitrarily large, and not
> respectful of work_mem (below, at the 4MB default).

> In v13:
> With optimize_above=-1, inline_above=-1: 333472 kB max resident size
> With optimize_above=1, inline_above=1: 654036 kB max resident size
>
> In v12:
> With optimize_above=-1, inline_above=-1: 330988 kB max resident size
> With optimize_above=1, inline_above=1: 556844 kB max resident size

That's probably to some degree just the shared library being loaded /
faulted in (see below for more detail). I don't think there's any
difference between 12/13 in what inlining / optimization does, so I
don't quite understand the difference.

> The OOM itself could be due to our JIT settings and maybe not a bug. My
> thinking is that when we hit OOM, that sometimes manifests itself as the C++
> exception, which somehow trickles up into an SIGABRT in postgres.

C++ code throws an exception when memory allocation fails, unless a
different error handler is set. While in LLVM for JITing we set a
different error handler (which doesn't throw, but FATALs), but that
doesn't appear to be done for all the postgis code. Nor does it - as
it really should, catch the exeption. An unhandled exception is treated
as an abort(), which in turn does a SIGABRT.

> For example, look at a this huge backend, which is connected to the process
> which have been experiencing OOM. The backend is 4.2GB while idle, so I'd have
> hoped for it to have released everything it could. I'm aware that free()
> usually cannot return small allocations to the kernel, but then should not lead
> to continued growth over time.

The below turned out to be somewhat obsoleted by what you wrote below,
which I unfortunately hadn't yet read - but I think it's still good
information, so I'm not gonna delete it:

RES includes all the shared buffers that a backend touched, unless you
use huge pages. At least on linux RES is unfortunately pretty much
useless for postgres. You can gain a bit more information by looking at
/proc/$pid/status and /proc/$pid/smaps_rollup

E.g. here is a backend without having run any JIT (pruned all of these):

andres(at)awork3:~$ grep ^Rss /proc/2810440/status
RssAnon: 2324 kB
RssFile: 6932 kB
RssShmem: 4868 kB

andres(at)awork3:~$ grep ^Pss /proc/2810440/smaps_rollup
Pss: 3912 kB
Pss_Anon: 1177 kB
Pss_File: 1004 kB
Pss_Shmem: 1730 kB

and after loading the llvmjit library (LOAD 'llvmjit'), but not actually
jitting anything:

andres(at)awork3:~$ grep ^Rss /proc/2810440/status
RssAnon: 7528 kB
RssFile: 104844 kB
RssShmem: 5204 kB

andres(at)awork3:~$ grep ^Pss /proc/2810440/smaps_rollup
Pss: 57773 kB
Pss_Anon: 6436 kB
Pss_File: 49606 kB
Pss_Shmem: 1730 kB

Note how RSS has increased by ~100MB - but all of that is
file-backed. It's just the shared library being loaded (not even all
required to be in memory). And if you look at the 'Pss' (p stands for
proportional) values you can see how it's lower, because it's shared
between all the processes that have JIT loaded - in my case here that
was just two session, hence halved.

The actually allocated memory differs between the two, but it's ~2.3MB
vs 7.5MB.

(you also can see how a postgres backend doesn't actually weigh 2MB,
because a lot of the memory allocations are shared between postmaster
and backends)

And if you additionally utilize buffers in shared buffers, RSS also will
included that. E.g.

postgres[2810440][1]=# SHOW shared_buffers;
┌────────────────┐
│ shared_buffers │
├────────────────┤
│ 2GB │
└────────────────┘
(1 row)

postgres[2810325][1]=# \dt+ large
List of relations
┌────────┬───────┬───────┬────────┬─────────────┬───────────────┬────────┬─────────────┐
│ Schema │ Name │ Type │ Owner │ Persistence │ Access Method │ Size │ Description │
├────────┼───────┼───────┼────────┼─────────────┼───────────────┼────────┼─────────────┤
│ public │ large │ table │ andres │ permanent │ heap │ 501 MB │ │
└────────┴───────┴───────┴────────┴─────────────┴───────────────┴────────┴─────────────┘
(1 row)

andres(at)awork3:~$ grep ^Rss /proc/2810440/status
RssAnon: 8284 kB
RssFile: 107236 kB
RssShmem: 19184 kB
andres(at)awork3:~$ grep ^Pss /proc/2810440/smaps_rollup
Pss: 63282 kB
Pss_Anon: 7413 kB
Pss_File: 50034 kB
Pss_Shmem: 5834 kB

postgres[2810440][1]=# SELECT count(generate_series), count(random) FROM large;
...

andres(at)awork3:~$ grep ^Rss /proc/2810440/status
RssAnon: 8592 kB
RssFile: 107464 kB
RssShmem: 390328 kB
andres(at)awork3:~$ grep ^Pss /proc/2810440/smaps_rollup
Pss: 248889 kB
Pss_Anon: 7560 kB
Pss_File: 50069 kB
Pss_Shmem: 191259 kB

So now, despite only ~8.5MB of allocations, ps's RSS will show a memory
usage considerably larger:

andres(at)awork3:~$ ps -u --pid 2810440
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
andres 2810440 0.2 0.2 2420584 506384 ? Ss 15:19 0:02 postgres: dev assert: andres postgres [local] idle

> The postgres memory contexts don't show anything nearly that large.
> MessageContext: 25236472 total in 20 blocks; 2990656 free (4 chunks); 22245816 used
> CacheMemoryContext: 25892160 total in 42 blocks; 22770984 free (28285 chunks); 3121176 used
> Grand total: 70879896 bytes in 1466 blocks; 29518440 free (29097 chunks); 41361456 used

I wonder if it'd be worth trying to redirect all LLVM allocations to be
done into a memory context. Not to be able to free it, but to be able to
see it more clearly. But it looks like it'd be too fragile to do so.

> I was able to make a small, apparent leak like so. This applies to
> postgis3.0/postgres12/LLVM5/centos7, and
> postgis3.1alpha/postgres13/LLVM9/centos8.
>
> Inlining is essential to see the leak, optimization is not. Showing every 9th
> query:
>
> $ for a in `seq 1 99`; do echo "SET jit_inline_above_cost=1; SET jit_optimize_above_cost=-1; SET jit_above_cost=0; SET jit_expressions=on; SET log_executor_stats=on; SET client_min_messages=debug; SET jit=on; explain analyze SELECT st_x(site_geo) FROM sites WHERE site_geo IS NOT NULL;"; done |psql ts 2>&1 |grep 'resident' |sed -n '1~9p'
> ! 46024 kB max resident size
> ! 46492 kB max resident size

Huh, that's certainly not good. Looking.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-10-03 23:30:46 Re: terminate called after throwing an instance of 'std::bad_alloc'
Previous Message Justin Pryzby 2020-10-03 22:02:36 Re: terminate called after throwing an instance of 'std::bad_alloc'