Re: BUG #16971: Incompatible datalayout errors with llvmjit

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Stellard <tstellar(at)redhat(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: BUG #16971: Incompatible datalayout errors with llvmjit
Date: 2021-04-20 22:52:28
Message-ID: 20210420225228.qr4x6zv3hqjorh5t@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2021-04-20 14:42:28 -0700, Tom Stellard wrote:
> On 4/20/21 12:29 PM, Andres Freund wrote:
> > On 2021-04-19 18:29:52 +0000, PG Bug reporting form wrote:
> > > In our Fedora builds, we are getting errors[1] in the postgresql tests due
> > > to incompatible datalayouts between the JIT engine and the LLVM modules
> > > being compiled. The problem is that the JIT engine is being created with
> > > host specific CPU and features, while the datalayout for the compiled module
> > > is being taken from llvmjit_types.bc which is compiled without any specified
> > > CPU type or features.
> >
> > It's very odd that features would change the data layout - analogizing
> > with plain C code that'd mean that you cannot link a binary compiled
> > with something like -mavx2 against a library compiled without. To me
> > this smells like a bug somewhere lower level.

> You are correct that is odd, and to be honest, I didn't think that LLVM
> targets were allowed to change the datalayout based on the CPU type.

That was my impression...

> > Reformatting the error yields:
> > ERROR: failed to JIT module: Added modules have incompatible data layouts:
> > E-m:e-i1:8:16-i8:8:16-i64:64-f128:64- a:8:16-n32:64 (module) vs
> > E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-v128:64-a:8:16-n32:64 (jit)
> >
> > The -v128:64 is about how to align vectors. Skimming the relevant LLVM
> > code I don't see why it'd be included in JIted code but not native code.

> The relevant code in LLVM is here:
> https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/SystemZ/SystemZTargetMachine.cpp#L88

Thanks for the pointer!

> I'm checking with upstream LLVM to see if this allowed or not. However,
> this behavior is present in at least LLVM 11 and LLVM 12 (I haven't
> checked earlier versions), so postgresql will have to deal with this
> somehow.

Yea, seems we need to add a workaround for the issue, given how much
longer LLVM releases tend to be used than they are maintained. One
simple hack would be to add "-vector" to the list of features on s390x,
which afaict should avoid the issue for now?

In LLVM's main branch the code is this:

// Determine whether we use the vector ABI.
static bool UsesVectorABI(StringRef CPU, StringRef FS) {
// We use the vector ABI whenever the vector facility is avaiable.
// This is the case by default if CPU is z13 or later, and can be
// overridden via "[+-]vector" feature string elements.
bool VectorABI = true;
bool SoftFloat = false;
if (CPU.empty() || CPU == "generic" ||
CPU == "z10" || CPU == "z196" || CPU == "zEC12" ||
CPU == "arch8" || CPU == "arch9" || CPU == "arch10")
VectorABI = false;

SmallVector<StringRef, 3> Features;
FS.split(Features, ',', -1, false /* KeepEmpty */);
for (auto &Feature : Features) {
if (Feature == "vector" || Feature == "+vector")
VectorABI = true;
if (Feature == "-vector")
VectorABI = false;
if (Feature == "soft-float" || Feature == "+soft-float")
SoftFloat = true;
if (Feature == "-soft-float")
SoftFloat = false;
}

return VectorABI && !SoftFloat;
}

So appending -vector should be sufficient?

But we'd have to do so only after checking that there's a data layout
mismatch, because otherwise we'd just create a new problem if somebody
compiles with -march=native or such.

Greetings,

Andres Freund

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2021-04-21 02:46:16 Re: BUG #16972: parameter parallel_leader_participation's category problem
Previous Message Tom Stellard 2021-04-20 21:42:28 Re: BUG #16971: Incompatible datalayout errors with llvmjit