Why JIT speed improvement is so modest?

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Why JIT speed improvement is so modest?
Date: 2019-11-25 15:09:29
Message-ID: 809c295d-9d0b-6a8f-c579-8b0ffe565cdc@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Right now JIT provides about 30% improvement of TPC-H Q1 query:

https://www.citusdata.com/blog/2018/09/11/postgresql-11-just-in-time/

I wonder why even at this query, which seems to be ideal use case for
JIT, we get such modest improvement?
I have raised this question several years ago - but that time JIT was
assumed to be in early development stage and performance aspects were
less critical
than required infrastructure changes. But right now JIT seems to be
stable enough and is switch on by default.
Vitesse DB reports 8x speedup on Q1,
ISP-RAS JIT version  provides 3x speedup of Q1:

https://www.pgcon.org/2017/schedule/attachments/467_PGCon%202017-05-26%2015-00%20ISPRAS%20Dynamic%20Compilation%20of%20SQL%20Queries%20in%20PostgreSQL%20Using%20LLVM%20JIT.pdf

According to this presentation Q1 spends 6% of time in ExecQual and 75%
in ExecAgg.

VOPS provides 10x improvement of Q1.

I have a hypothesis that such difference was caused by the way of
aggregates calculation.
Postgres is using Youngs-Cramer algorithm while both ISPRAS JIT version
and my VOPS are just accumulating results in variable of type double.
I rewrite VOPS to use the same algorithm as Postgres, but VOPS is still
about 10 times faster.

Results of Q1 on scale factor=10 TPC-H data at my desktop with parallel
execution enabled:
no-JIT: 5640 msec
JIT:      4590msec
VOPS: 452 msec
VOPS + Youngs-Cramer algorithm: 610 msec

Below are tops of profiles (functions with more than 1% of time):

JIT:
  10.98%  postgres  postgres            [.] float4_accum
   8.40%  postgres  postgres            [.] float8_accum
   7.51%  postgres  postgres            [.] HeapTupleSatisfiesVisibility
   5.92%  postgres  postgres            [.] ExecInterpExpr
   5.63%  postgres  postgres            [.] tts_minimal_getsomeattrs
   4.35%  postgres  postgres            [.] lookup_hash_entries
   3.72%  postgres  postgres            [.] TupleHashTableHash.isra.8
   2.93%  postgres  postgres            [.] tuplehash_insert
   2.70%  postgres  postgres            [.] heapgettup_pagemode
   2.24%  postgres  postgres            [.] check_float8_array
   2.23%  postgres  postgres            [.] hash_search_with_hash_value
   2.10%  postgres  postgres            [.] ExecScan
   1.90%  postgres  postgres            [.] hash_uint32
   1.57%  postgres  postgres            [.] tts_minimal_clear
   1.53%  postgres  postgres            [.] FunctionCall1Coll
   1.47%  postgres  postgres            [.] pg_detoast_datum
   1.39%  postgres  postgres            [.] heapgetpage
   1.37%  postgres  postgres            [.] TupleHashTableMatch.isra.9
   1.35%  postgres  postgres            [.] ExecStoreBufferHeapTuple
   1.06%  postgres  postgres            [.] LookupTupleHashEntry
   1.06%  postgres  postgres            [.] AggCheckCallContext

no-JIT:
  26.82%  postgres  postgres            [.] ExecInterpExpr
  15.26%  postgres  postgres            [.] tts_buffer_heap_getsomeattrs
   8.27%  postgres  postgres            [.] float4_accum
   7.51%  postgres  postgres            [.] float8_accum
   5.26%  postgres  postgres            [.] HeapTupleSatisfiesVisibility
   2.78%  postgres  postgres            [.] TupleHashTableHash.isra.8
   2.63%  postgres  postgres            [.] tts_minimal_getsomeattrs
   2.54%  postgres  postgres            [.] lookup_hash_entries
   2.05%  postgres  postgres            [.] tuplehash_insert
   1.97%  postgres  postgres            [.] heapgettup_pagemode
   1.72%  postgres  postgres            [.] hash_search_with_hash_value
   1.57%  postgres  postgres            [.] float48mul
   1.55%  postgres  postgres            [.] check_float8_array
   1.48%  postgres  postgres            [.] ExecScan
   1.26%  postgres  postgres            [.] hash_uint32
   1.04%  postgres  postgres            [.] tts_minimal_clear
   1.00%  postgres  postgres            [.] FunctionCall1Coll

VOPS:
  44.25%  postgres  vops.so            [.] vops_avg_state_accumulate
  11.76%  postgres  vops.so            [.] vops_float4_avg_accumulate
   6.14%  postgres  postgres           [.] ExecInterpExpr
   5.89%  postgres  vops.so            [.] vops_float4_sub_lconst
   4.89%  postgres  vops.so            [.] vops_float4_mul
   4.30%  postgres  vops.so            [.] vops_int4_le_rconst
   2.57%  postgres  vops.so            [.] vops_float4_add_lconst
   2.31%  postgres  vops.so            [.] vops_count_accumulate
   2.24%  postgres  postgres           [.] tts_buffer_heap_getsomeattrs
   1.97%  postgres  postgres           [.] heap_page_prune_opt
   1.72%  postgres  postgres           [.] HeapTupleSatisfiesVisibility
   1.67%  postgres  postgres           [.] AllocSetAlloc
   1.47%  postgres  postgres           [.] hash_search_with_hash_value

In theory by elimination of interpretation overhead JIT should provide
performance comparable with vecrtorized executor.
In most programming languages using JIT compiler instead of byte-code
interpreter provides about 10x speed improvement.
Certainly DBMS engine is very different with traditional interpreter and
a lot of time is spent in tuple packing/unpacking (although JIT is also
used here),
in heap traversal,... But it is still unclear to me why if ISPRAS
measurement were correct and we actually spent 75% of Q1 time in
aggregation,
JIT was not able to significantly (times) increase speed on Q1 query? 
Experiment with VOPS shows that used aggregation algorithm itself is not
a bottleneck.
Profile also give no answer for this question.
Any ideas?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2019-11-25 15:24:29 Re: Why JIT speed improvement is so modest?
Previous Message Juan José Santamaría Flecha 2019-11-25 15:06:46 Re: logical decoding : exceeded maxAllocatedDescs for .spill files