Re: JIT compiling with LLVM v9.0

From: Andres Freund <andres(at)anarazel(dot)de>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JIT compiling with LLVM v9.0
Date: 2018-01-27 21:20:37
Message-ID: 20180127212037.qrsq2hhfymhpsmjg@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2018-01-26 22:52:35 -0800, Jeff Davis wrote:
> The version of LLVM that I tried this against had a linker option
> called "InternalizeLinkedSymbols" that would prevent the visibility
> problem you mention (assuming I understand you correctly).

I don't think they're fully solvable - you can't really internalize a
reference to a mutable static variable in another translation
unit. Unless you modify that translation unit, which doesn't work when
postgres running.

> That option is no longer there so I will have to figure out how to do
> it with the current LLVM API.

Look at the llvmjit_wrap.c code invoking FunctionImporter - that pretty
much does that. I'll push a cleaned up version of that code sometime
this weekend (it'll then live in llvmjit_inline.cpp).

> > Afaict that's effectively what I've already implemented. We could export
> > more input as constants to the generated program, but other than that...
>
> I brought this up in the context of slot_compile_deform(). In your
> patch, you have code like:
>
> + if (!att->attnotnull)
> + {
> ...
> + v_nullbyte = LLVMBuildLoad(
> + builder,
> + LLVMBuildGEP(builder, v_bits,
> + &v_nullbyteno, 1, ""),
> + "attnullbyte");
> +
> + v_nullbit = LLVMBuildICmp(
> + builder,
> + LLVMIntEQ,
> + LLVMBuildAnd(builder, v_nullbyte,
> v_nullbytemask, ""),
> + LLVMConstInt(LLVMInt8Type(), 0, false),
> + "attisnull");
> ...
>
> So it looks like you are reimplementing the generic code, but with
> conditional code gen. If the generic code changes, someone will need
> to read, understand, and change this code, too, right?

Right. Not that that's code that has changed that much...

> With my approach, then it would initially do *un*conditional code gen,
> and be less efficient and less specialized than the code generated by
> your current patch. But then it would link in the constant tupledesc,
> and optimize, and the optimizer will realize that they are constants
> (hopefully) and then cut out a lot of the dead code and specialize it
> to the given tupledesc.

Right.

> This places a lot of faith in the optimizer and I realize it may not
> happen as nicely with real code as it did with my earlier experiments.
> Maybe you already tried and you are saying that's a dead end? I'll
> give it a shot, though.

I did that, yes. There's two major downsides:

a) The code isn't as efficient as the handrolled code. The handrolled
code e.g. can take into account that it doesn't need to access the
NULL bitmap for a NOT NULL column and we don't need to check the
tuple's number of attributes if there's a following NOT NULL
attribute. Those safe a good number of cycles.

b) The optimizations to take advantage of the constants and make the
code faster with the constant tupledesc is fairly slow (you pretty
much need at least an -O2 equivalent), whereas the handrolled tuple
deforming is faster than the slot_getsomeattrs with just a single,
pretty cheap, mem2reg pass. We're talking about ~1ms vs 70-100ms in
a lot of cases. The optimizer often will not actually unroll the
loop with many attributes despite that being beneficial.

I think in most cases using the approach you advocate makes sense, to
avoid duplication, but tuple deforming is such a major bottleneck that I
think it's clearly worth doing it manually. Being able to use llvm with
just a always-inline and a mem2reg pass makes it so much more widely
applicable than doing the full inlining and optimization work.

> >> I experimented a bit before and it works for basic cases, but I'm not
> >> sure if it's as good as your hand-generated LLVM.
> >
> > For deforming it doesn't even remotely get as good in my experiments.
>
> I'd like some more information here -- what didn't work? It didn't
> recognize constants? Or did recognize them, but didn't optimize as
> well as you did by hand?

It didn't optimize as well as I did by hand, without significantly
complicating (and slowing) the originating the code. It sometimes
decided not to unroll the loop, and it takes a *lot* longer than the
direct emission of the code.

I'm hoping to work on making more of the executor JITed, and there I do
think it's largely going to be what you're proposing, due to the sheer
mass of code.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-01-27 21:45:41 Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?
Previous Message David Rowley 2018-01-27 21:09:15 Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?