Re: terminate called after throwing an instance of 'std::bad_alloc'

From: Andres Freund <andres(at)anarazel(dot)de>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: terminate called after throwing an instance of 'std::bad_alloc'
Date: 2020-10-07 23:02:42
Message-ID: 20201007230242.rnwsbnxywvthef3w@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Sorry for the second send of this email, but somehow I managed to mangle
the headers in the last version I sent.

On 2020-10-03 19:01:27 -0700, Andres Freund wrote:
> On 2020-10-03 18:30:46 -0500, Justin Pryzby wrote:
> > On Sat, Oct 03, 2020 at 04:01:49PM -0700, Andres Freund wrote:
> > > > I was able to make a small, apparent leak like so. This applies to
> > > > postgis3.0/postgres12/LLVM5/centos7, and
> > > > postgis3.1alpha/postgres13/LLVM9/centos8.
> > > >
> > > > Inlining is essential to see the leak, optimization is not. Showing every 9th
> > > > query:
> > > >
> > > > $ for a in `seq 1 99`; do echo "SET jit_inline_above_cost=1; SET jit_optimize_above_cost=-1; SET jit_above_cost=0; SET jit_expressions=on; SET log_executor_stats=on; SET client_min_messages=debug; SET jit=on; explain analyze SELECT st_x(site_geo) FROM sites WHERE site_geo IS NOT NULL;"; done |psql ts 2>&1 |grep 'resident' |sed -n '1~9p'
> > > > ! 46024 kB max resident size
> > > > ! 46492 kB max resident size
> > >
> > > Huh, that's certainly not good. Looking.
> >
> > Reproduce like:
> > postgres=# CREATE EXTENSION postgis;
> > postgres=# CREATE TABLE geo(g geometry(point));
> > postgres=# INSERT INTO geo SELECT st_GeometryFromText('POINT(11 12)',0) FROM generate_series(1,99999);
> > postgres=# SET jit_above_cost=0; SET jit_inline_above_cost=0; SET client_min_messages=debug; SET log_executor_stats=on; explain analyze SELECT st_x(g) FROM geo;
>
> It's an issue inside LLVM. I'm not yet sure how to avoid it. Basically
> whenever we load an IR module to inline from, it renames its type names
> to make them not conflict with already known type names in the current
> LLVMContext. That, over time, leads to a large number of type names in
> memory. We can avoid the re-loading of the module by instead cloning it,
> but that still leads to new types being created for some internal
> reasons. It's possible to create/drop separate LLVMContext instances,
> but we'd have to figure out when to do so - it's too expensive to always
> do that.

I spent a lot of time (probably too much time) trying to find a good way
out of that. I think I mostly figured it out, but it's gonna take me a
bit longer to nail down the loose ends. The nice part is that it
actually speeds up inlining a lot. The bad part is that it's not a small
change, so I don't yet know how to deal with existing releases.

Details:

The issue with the leaks is basically that when LLVM loads a bitcode
module it doesn't "unify" named types with types in other bitcode module
- they could after all be entirely independent. That means that we end
up with a lot of types after a while of running.

That ought to be addressable by not re-loading modules we inline from
from disk every time we inline, but unfortunately that doesn't work well
either: The "cross module import" logic unfortunately modifies the types
used in the "source module", leading to subsequent imports not working
well. That last issue is why I had moved to re-loading modules from
"disk" during import.

The partial patch I've written doesn't use the existing cross module
import code anymore, but moves to the lower level functionality it
uses. For traditional compilers, which is what that linker is mainly
written for, it makes sense to reduce memory usage by "destructively"
moving code from the source into the target - it'll only be used
once. But for the JIT case it's better to do so non-destructively. That
requires us to implement some "type resolution" between the source and
target modules, but then gains not needing to reload / copy the source
module anymore, saving a lot of the costs.

It's possible that for existing releases we ought to go for a simpler
approach, even if it's not as good. The only really feasible way I see
is to basically count the number of times inlining has been done, and
then just re-initialize after a certain time. But that's pretty darn
yucky.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2020-10-07 23:48:18 Re: Deleting older versions in unique indexes to avoid page splits
Previous Message Andres Freund 2020-10-07 23:01:14 Re: terminate called after throwing an instance of 'std::bad_alloc'