JIT compiling expressions/deform + inlining prototype v2.0

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: JIT compiling expressions/deform + inlining prototype v2.0
Date: 2017-09-01 06:41:31
Message-ID: 20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I previously had an early prototype of JITing [1] expression evaluation
and tuple deforming. I've since then worked a lot on this.

Here's an initial, not really pretty but functional, submission. This
supports all types of expressions, and tuples, and allows, albeit with
some drawbacks, inlining of builtin functions. Between the version at
[1] and this I'd done some work in c++, because that allowed to
experiment more with llvm, but I've now translated everything back.
Some features I'd to re-implement due to limitations of C API.

As a teaser:
tpch_5[9586][1]=# set jit_expressions=0;set jit_tuple_deforming=0;
tpch_5[9586][1]=# \i ~/tmp/tpch/pg-tpch/queries/q01.sql
┌──────────────┬──────────────┬───────────┬──────────────────┬──────────────────┬──────────────────┬──────────────────┬──────────────────┬────────────────────┬─────────────┐
│ l_returnflag │ l_linestatus │ sum_qty │ sum_base_price │ sum_disc_price │ sum_charge │ avg_qty │ avg_price │ avg_disc │ count_order │
├──────────────┼──────────────┼───────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼────────────────────┼─────────────┤
│ A │ F │ 188818373 │ 283107483036.109 │ 268952035589.054 │ 279714361804.23 │ 25.5025937044707 │ 38237.6725307617 │ 0.0499976863510723 │ 7403889 │
│ N │ F │ 4913382 │ 7364213967.94998 │ 6995782725.6633 │ 7275821143.98952 │ 25.5321530459003 │ 38267.7833908406 │ 0.0500308669240696 │ 192439 │
│ N │ O │ 375088356 │ 562442339707.852 │ 534321895537.884 │ 555701690243.972 │ 25.4978961033505 │ 38233.9150565265 │ 0.0499956453049625 │ 14710561 │
│ R │ F │ 188960009 │ 283310887148.206 │ 269147687267.211 │ 279912972474.866 │ 25.5132328961366 │ 38252.4148049933 │ 0.0499958481590264 │ 7406353 │
└──────────────┴──────────────┴───────────┴──────────────────┴──────────────────┴──────────────────┴──────────────────┴──────────────────┴────────────────────┴─────────────┘
(4 rows)

Time: 4367.486 ms (00:04.367)
tpch_5[9586][1]=# set jit_expressions=1;set jit_tuple_deforming=1;
tpch_5[9586][1]=# \i ~/tmp/tpch/pg-tpch/queries/q01.sql
<repeat>
(4 rows)

Time: 3158.575 ms (00:03.159)

tpch_5[9586][1]=# set jit_expressions=0;set jit_tuple_deforming=0;
tpch_5[9586][1]=# \i ~/tmp/tpch/pg-tpch/queries/q01.sql
<repeat>
(4 rows)
Time: 4383.562 ms (00:04.384)

The potential wins of the JITing itself are considerably larger than the
already significant gains demonstrated above - this version here doesn't
exactly generate the nicest native code around. After these patches the
bottlencks for TCP-H's Q01 are largely inside the float* functions and
the non-expressionified execGrouping.c code. The latter needs to be
expressified to gain benefits due to JIT - that shouldn't be very hard.

The code generation can be improved by moving more of the variable data
into llvm allocated stack data, that also has other benefits.

The patch series currently consists out of the following:

0001-Rely-on-executor-utils-to-build-targetlist-for-DML-R.patch
- boring prep work

0002-WIP-Allow-tupleslots-to-have-a-fixed-tupledesc-use-i.patch
- for JITed deforming we need to know whether a slot's tupledesc will
change

0003-WIP-Add-configure-infrastructure-to-enable-LLVM.patch
- boring

0004-WIP-Beginning-of-a-LLVM-JIT-infrastructure.patch
- infrastructure for llvm, including memory lifetime management, and
bulk emission of functions.

0005-Perform-slot-validity-checks-in-a-separate-pass-over.patch
- boring, prep work for expression jiting

0006-WIP-deduplicate-int-float-overflow-handling-code.patch
- boring

0007-Pass-through-PlanState-parent-to-expression-instanti.patch
- boring

0008-WIP-JIT-compile-expression.patch
- that's the biggest patch, actually adding JITing
- code needs to be better documented, tested, and deduplicated

0009-Simplify-aggregate-code-a-bit.patch
0010-More-efficient-AggState-pertrans-iteration.patch
0011-Avoid-dereferencing-tts_values-nulls-repeatedly.patch
0012-Centralize-slot-deforming-logic-a-bit.patch
- boring, mostly to make comparison between JITed and non-jitted a bit
fairer and to remove unnecessary other bottlenecks.

0013-WIP-Make-scan-desc-available-for-all-PlanStates.patch
- this isn't clean enough.

0014-WIP-JITed-tuple-deforming.patch

- do JITing of deforming, but only when called from within expression,
there we know which columns we want to be deformed etc.

- Not clear what'd be a good way to also JIT other deforming without
additional infrastructure - doing a separate function emission for
every slot_deform_tuple() is unattractive performancewise and
memory-lifetime wise, I did have that at first.

0015-WIP-Expression-based-agg-transition.patch
- allows to JIT aggregate transition invocation, but also speeds up
aggregates without JIT.

0016-Hacky-Preliminary-inlining-implementation.patch
- allows to inline functions, by using bitcode. That bitcode can be
loaded from a list of directories - as long as compatibly configured
the bitcode doesn't have to be generated by the same compiler as the
postgres binary. i.e. gcc postgres + clang bitcode works.

I've whacked this around quite heavily today, this likely has some new
bugs, sorry for that :(

I plan to spend some considerable time over the next weeks to clean this
up and address some of the areas where the performance isn't yet as good
as desirable.

Greetings,

Andres Freund

[1] http://archives.postgresql.org/message-id/20161206034955.bh33paeralxbtluv%40alap3.anarazel.de

Attachment Content-Type Size
0001-Rely-on-executor-utils-to-build-targetlist-for-DML-R.patch text/x-diff 2.0 KB
0002-WIP-Allow-tupleslots-to-have-a-fixed-tupledesc-use-i.patch text/x-diff 68.3 KB
0003-WIP-Add-configure-infrastructure-to-enable-LLVM.patch text/x-diff 5.9 KB
0004-WIP-Beginning-of-a-LLVM-JIT-infrastructure.patch text/x-diff 25.2 KB
0005-Perform-slot-validity-checks-in-a-separate-pass-over.patch text/x-diff 13.5 KB
0006-WIP-deduplicate-int-float-overflow-handling-code.patch text/x-diff 10.1 KB
0007-Pass-through-PlanState-parent-to-expression-instanti.patch text/x-diff 3.2 KB
0008-WIP-JIT-compile-expression.patch text/x-diff 77.3 KB
0009-Simplify-aggregate-code-a-bit.patch text/x-diff 9.7 KB
0010-More-efficient-AggState-pertrans-iteration.patch text/x-diff 4.1 KB
0011-Avoid-dereferencing-tts_values-nulls-repeatedly.patch text/x-diff 2.3 KB
0012-Centralize-slot-deforming-logic-a-bit.patch text/x-diff 8.2 KB
0013-WIP-Make-scan-desc-available-for-all-PlanStates.patch text/x-diff 1.4 KB
0014-WIP-JITed-tuple-deforming.patch text/x-diff 26.0 KB
0015-WIP-Expression-based-agg-transition.patch text/x-diff 62.9 KB
0016-Hacky-Preliminary-inlining-implementation.patch text/x-diff 16.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2017-09-01 07:01:03 Re: utility commands benefiting from parallel plan
Previous Message Tatsuro Yamada 2017-09-01 06:38:12 Re: CLUSTER command progress monitor