Re: Why JIT speed improvement is so modest?

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why JIT speed improvement is so modest?
Date: 2019-11-27 15:38:45
Message-ID: 7db99b51-c349-140a-5227-dfd4d61e668e@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 25.11.2019 18:24, Merlin Moncure wrote:
> On Mon, Nov 25, 2019 at 9:09 AM Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>> JIT was not able to significantly (times) increase speed on Q1 query?
>> Experiment with VOPS shows that used aggregation algorithm itself is not
>> a bottleneck.
>> Profile also give no answer for this question.
>> Any ideas?
> Well, in the VOPS variant around 2/3 of the time is spent in routines
> that are obviously aggregation. In the JIT version, it's around 20%.
> So this suggests that the replacement execution engine is more
> invasive. I would also guess (!) that the VOPS engine optimizes fewer
> classes of query plan. ExecScan for example, looks to be completely
> optimized out VOPS but is still utilized in the JIT engine.

The difference in fraction of time spent in aggregate calculation is not
so large (2 times vs. 10 times).
I suspected that a lot of time is spent in relation traversal code,
tuple unpacking and visibility checks.
To check this hypothesis I have implement in-memory table access method
which stores tuples in unpacked form and
doesn't perform any visibility checks at all.
Results were not so existed. I have to disable parallel execution
(because it is not possible for tuples stored in backend private memory).
Results are the following:

lineitem:               13736 msec
inmem_lineitem:  10044 msec
vops_lineitem:        1945 msec

The profile of inmem_lineitem is the following:

  16.79%  postgres  postgres             [.] float4_accum
  12.86%  postgres  postgres             [.] float8_accum
   5.83%  postgres  postgres             [.] TupleHashTableHash.isra.8
   4.44%  postgres  postgres             [.] lookup_hash_entries
   3.37%  postgres  postgres             [.] check_float8_array
   3.11%  postgres  postgres             [.] tuplehash_insert
   2.91%  postgres  postgres             [.] hash_uint32
   2.83%  postgres  postgres             [.] ExecScan
   2.56%  postgres  postgres             [.] inmem_getnextslot
   2.22%  postgres  postgres             [.] FunctionCall1Coll
   2.14%  postgres  postgres             [.] LookupTupleHashEntry
   1.95%  postgres  postgres             [.] TupleHashTableMatch.isra.9
   1.76%  postgres  postgres             [.] pg_detoast_datum
   1.58%  postgres  postgres             [.] AggCheckCallContext
   1.57%  postgres  postgres             [.] tts_minimal_clear
   1.35%  postgres  perf-3054.map        [.] 0x00007f558db60010
   1.23%  postgres  postgres             [.] fetch_input_tuple
   1.15%  postgres  postgres             [.] SeqNext
   1.06%  postgres  postgres             [.] ExecAgg
   1.00%  postgres  postgres             [.] tts_minimal_store_tuple

So now fraction of time spent in aggregation is increased to 30% (vs.
20% for lineitem and 42% for vops_lineitem).
Looks like the main bottleneck now is hashagg. VOPS is accessing hash
about 10 times less (because it accumulates values for the whole tile).
And it explains still large difference bwtween vops_lineitem and
inmem_lineitem.

If we remove aggregation and rewrite Q1 query as:
select
    avg(l_quantity) as sum_qty,
    avg(l_extendedprice) as sum_base_price,
    avg(l_extendedprice*(1-l_discount)) as sum_disc_price,
    avg(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(*) as count_order
from
    inmem_lineitem
where
    l_shipdate <= '1998-12-01';

then results are the following:
lineitem:               9805 msec
inmem_lineitem:  6257 msec
vops_lineitem:      1865 msec

and now profile of inmem_lineitem is:

  25.27%  postgres  postgres           [.] float4_accum
  21.86%  postgres  postgres           [.] float8_accum
   5.49%  postgres  postgres           [.] check_float8_array
   4.57%  postgres  postgres           [.] ExecScan
   2.61%  postgres  postgres           [.] AggCheckCallContext
   2.30%  postgres  postgres           [.] pg_detoast_datum
   2.10%  postgres  postgres           [.] inmem_getnextslot
   1.81%  postgres  postgres           [.] SeqNext
   1.73%  postgres  postgres           [.] fetch_input_tuple
   1.61%  postgres  postgres           [.] ExecAgg
   1.23%  postgres  postgres           [.] MemoryContextReset

But still more than 3 times difference with VOPS!
Something is wrong here...

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2019-11-27 16:01:49 Re: allow_system_table_mods stuff
Previous Message Maxence Ahlouche 2019-11-27 15:30:12 Re: Invisible PROMPT2