Re: JIT compiling with LLVM v10.1

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: JIT compiling with LLVM v10.1
Date: 2018-02-15 08:59:46
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

On 14.02.2018 21:17, Andres Freund wrote:
> Hi,
> On 2018-02-07 06:54:05 -0800, Andres Freund wrote:
>> I've pushed v10.0. The big (and pretty painful to make) change is that
>> now all the LLVM specific code lives in src/backend/jit/llvm, which is
>> built as a shared library which is loaded on demand.
>> The layout is now as follows:
>> src/backend/jit/jit.c:
>> Part of JITing always linked into the server. Supports loading the
>> LLVM using JIT library.
>> src/backend/jit/llvm/
>> Infrastructure:
>> llvmjit.c:
>> General code generation and optimization infrastructure
>> llvmjit_error.cpp, llvmjit_wrap.cpp:
>> Error / backward compat wrappers
>> llvmjit_inline.cpp:
>> Cross module inlining support
>> Code-Gen:
>> llvmjit_expr.c
>> Expression compilation
>> llvmjit_deform.c
>> Deform compilation
> I've pushed a revised version that hopefully should address Jeff's
> wish/need of being able to experiment with this out of core. There's now
> a "jit_provider" PGC_POSTMASTER GUC that's by default set to
> "llvmjit". is the .so implementing JIT using LLVM. It fills a
> set of callbacks via
> extern void _PG_jit_provider_init(JitProviderCallbacks *cb);
> which can also be implemented by any other potential provider.
> The other two biggest changes are that I've added a README
> and that I've revised the configure support so it does more error
> checks, and moved it into config/llvm.m4.
> There's a larger smattering of small changes too.
> I'm pretty happy with how the separation of core / shlib looks now. I'm
> planning to work on cleaning and then pushing some of the preliminary
> patches (fixed tupledesc, grouping) over the next few days.
> Greetings,
> Andres Freund

I have made  some more experiments with efficiency of JIT-ing of deform
tuple and I want to share this results (I hope that them will be
It is well known fact that Postgres spends most of the time in sequence
scan queries for warm data in deforming tuples (17% in case of TPC-H Q1).
Postgres  tries to optimize access to the tuple by caching fixed size
offsets to the fields whenever possible and loading attributes on demand.
It is also well know recommendation to put fixed size, non-null,
frequently used attributes at the beginning of table's attribute list to
make this optimization work more efficiently.
You can see in the code of heap_deform_tuple shows that first NULL value
will switch it to "slow" mode:

for (attnum = 0; attnum < natts; attnum++)
        Form_pg_attribute thisatt = TupleDescAttr(tupleDesc, attnum);

        if (hasnulls && att_isnull(attnum, bp))
            values[attnum] = (Datum) 0;
            isnull[attnum] = true;
            slow = true;        /* can't use attcacheoff anymore */

I tried to investigate importance of this optimization and what is
actual penalty of "slow" mode.
At the same time I want to understand how JIT help to speed-up tuple

I have populated with data three tables:

create table t1(id integer primary key,c1 integer,c2 integer,c3
integer,c4 integer,c5 integer,c6 integer,c7 integer,c8 integer,c9 integer);
create table t2(id integer primary key,c1 integer,c2 integer,c3
integer,c4 integer,c5 integer,c6 integer,c7 integer,c8 integer,c9 integer);
create table t3(id integer primary key,c1 integer not null,c2 integer
not null,c3 integer not null,c4 integer not null,c5 integer not null,c6
integer not null,c7 integer not null,c8 integer not null,c9 integer not
insert into t1 (id,c1,c2,c3,c4,c5,c6,c7,c8) values
insert into t2 (id,c2,c3,c4,c5,c6,c7,c8,c9) values
insert into t3 (id,c1,c2,c3,c4,c5,c6,c7,c8,c9) values
vacuum analyze t1;
vacuum analyze t2;
vacuum analyze t3;

t1 contains null in last c9 column, t2 - in first c1 columns and t3 has
all attributes declared as not-null (and JIT can use this knowledge to
generate more efficient deforming code).
All data set is hold in memory (shared buffer size is greater than
database size) and I intentionally switch off parallel execution to make
results more deterministic.
I run two queries calculating aggregates on one/all not-null fields:

select sum(c8) from t*;
select sum(c2), sum(c3), sum(c4), sum(c5), sum(c6), sum(c7), sum(c8)
from t*;

As expected 35% time was spent in heap_deform_tuple.
But results (msec) were slightly confusing and unexected:

select sum(c8) from t*;

w/o JIT
with JIT
t1 763
t2 772

select sum(c2), sum(c3), sum(c4), sum(c5), sum(c6), sum(c7), sum(c8)
from t*;

w/o JIT
with JIT
t1 1239 742
t2 1233 747
1255 803

I repeat each query 10 times and take the minimal time ( I think that it
is more meaningful than average time which depends on some other
activity on the system).
So there is no big difference between "slow" and "fast" ways of
deforming tuple.
Moreover, for sometimes "slow" case is faster. Although I have to say
thatvariance of results is quite large: about 10%.
But in any case, I can made two conclusions from this results:

1. Modern platforms are mostly limited by memory access time, number of
performed instructions is less critical.
This is why extra processing needed for nullable attributes can not
significantly affect performance.
2. For large number of attributes JIT-ing of deform tuple can improve
speed up to two time. Which is quite good result from my point of view.


Konstantin Knizhnik
Postgres Professional:
The Russian Postgres Company

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-02-15 09:11:20 Re: [HACKERS] advanced partition matching algorithm for partition-wise join
Previous Message Konstantin Knizhnik 2018-02-15 08:20:00 Re: Cached/global query plans, autopreparation