Re: Don't clean up LLVM state when exiting in a bad way

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Jelte Fennema <Jelte(dot)Fennema(at)microsoft(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Don't clean up LLVM state when exiting in a bad way
Date: 2021-09-06 02:33:56
Message-ID: 20210906023356.GG26465@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 18, 2021 at 03:00:59PM +0000, Jelte Fennema wrote:
> I ran into some segfaults when using Postgres that was compiled with LLVM 7. According to the backtraces these crashes happened during the call to llvm_shutdown, during cleanup after another out of memory condition. It seems that calls to LLVMOrcDisposeInstance, can crash (at least on LLVM 7) when LLVM is left in bad state. I attached the relevant part of the stacktrace to this email.
>
> With the attached patch these segfaults went away. The patch turns llvm_shutdown into a no-op whenever the backend is exiting with an error. Based on my understanding of the code this should be totally fine. No memory should be leaked, since all memory will be cleaned up anyway once the backend exits shortly after. The only reason this cleanup code even seems to exist at all is to get useful LLVM profiling data. To me it seems be acceptable if the profiling data is incorrect/missing when the backend exits with an error.

Andres , could you comment on this ?

This seems to explain the crash I reported to you when testing your WIP patches
for the JIT memory leak. I realize now that the crash happens without your
patches.
https://www.postgresql.org/message-id/20210419164130.byegpfrw46mzagcu@alap3.anarazel.de

I can reproduce the crash on master (not just v13, as I said before) compiled
on centos7, with:
LLVM_CONFIG=/usr/lib64/llvm7.0/bin/llvm-config CLANG=/opt/rh/llvm-toolset-7.0/root/usr/bin/clang

I cannot reproduce the crash after applying Jelte's patch.

I couldn't crash on ubuntu either, so maybe they have a patch which fixes this,
or maybe RH applied a patch which caused it...

postgres=# CREATE TABLE t AS SELECT i FROM generate_series(1,999999)i; VACUUM ANALYZE t;
postgres=# SET client_min_messages=debug; SET statement_timeout=333; SET jit_above_cost=0; SET jit_optimize_above_cost=-1; SET jit_inline_above_cost=-1; explain analyze SELECT sum(i) FROM t a NATURAL JOIN t b;
2021-09-05 22:47:12.807 ADT client backend[7563] psql ERROR: canceling statement due to statement timeout
2021-09-05 22:47:12.880 ADT postmaster[7272] LOG: background worker "parallel worker" (PID 8212) was terminated by signal 11: Segmentation fault

--
Justin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-09-06 03:14:02 Re: PG Docs - CREATE SUBSCRIPTION option list order
Previous Message Michael Paquier 2021-09-06 02:28:06 Re: Unused variable in TAP tests file