Quick Links

Re: Why JIT speed improvement is so modest?

From:	Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To:	Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Why JIT speed improvement is so modest?
Date:	2019-11-27 15:38:45
Message-ID:	7db99b51-c349-140a-5227-dfd4d61e668e@postgrespro.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 25.11.2019 18:24, Merlin Moncure wrote:
> On Mon, Nov 25, 2019 at 9:09 AM Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>> JIT was not able to significantly (times) increase speed on Q1 query?
>> Experiment with VOPS shows that used aggregation algorithm itself is not
>> a bottleneck.
>> Profile also give no answer for this question.
>> Any ideas?
> Well, in the VOPS variant around 2/3 of the time is spent in routines
> that are obviously aggregation. In the JIT version, it's around 20%.
> So this suggests that the replacement execution engine is more
> invasive. I would also guess (!) that the VOPS engine optimizes fewer
> classes of query plan. ExecScan for example, looks to be completely
> optimized out VOPS but is still utilized in the JIT engine.

The difference in fraction of time spent in aggregate calculation is not
so large (2 times vs. 10 times).
I suspected that a lot of time is spent in relation traversal code,
tuple unpacking and visibility checks.
To check this hypothesis I have implement in-memory table access method
which stores tuples in unpacked form and
doesn't perform any visibility checks at all.
Results were not so existed. I have to disable parallel execution
(because it is not possible for tuples stored in backend private memory).
Results are the following:

lineitem: 13736 msec
inmem_lineitem: 10044 msec
vops_lineitem: 1945 msec

The profile of inmem_lineitem is the following:

16.79% postgres postgres             [.] float4_accum
12.86% postgres postgres             [.] float8_accum
   5.83% postgres postgres             [.] TupleHashTableHash.isra.8
   4.44% postgres postgres             [.] lookup_hash_entries
   3.37% postgres postgres             [.] check_float8_array
   3.11% postgres postgres             [.] tuplehash_insert
   2.91% postgres postgres             [.] hash_uint32
   2.83% postgres postgres             [.] ExecScan
   2.56% postgres postgres             [.] inmem_getnextslot
   2.22% postgres postgres             [.] FunctionCall1Coll
   2.14% postgres postgres             [.] LookupTupleHashEntry
   1.95% postgres postgres             [.] TupleHashTableMatch.isra.9
   1.76% postgres postgres             [.] pg_detoast_datum
   1.58% postgres postgres             [.] AggCheckCallContext
   1.57% postgres postgres             [.] tts_minimal_clear
   1.35% postgres perf-3054.map        [.] 0x00007f558db60010
   1.23% postgres postgres             [.] fetch_input_tuple
   1.15% postgres postgres             [.] SeqNext
   1.06% postgres postgres             [.] ExecAgg
   1.00% postgres postgres             [.] tts_minimal_store_tuple

So now fraction of time spent in aggregation is increased to 30% (vs.
20% for lineitem and 42% for vops_lineitem).
Looks like the main bottleneck now is hashagg. VOPS is accessing hash
about 10 times less (because it accumulates values for the whole tile).
And it explains still large difference bwtween vops_lineitem and
inmem_lineitem.

If we remove aggregation and rewrite Q1 query as:
select
    avg(l_quantity) as sum_qty,
    avg(l_extendedprice) as sum_base_price,
    avg(l_extendedprice*(1-l_discount)) as sum_disc_price,
    avg(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(*) as count_order
from
    inmem_lineitem
where
    l_shipdate <= '1998-12-01';

then results are the following:
lineitem: 9805 msec
inmem_lineitem: 6257 msec
vops_lineitem: 1865 msec

and now profile of inmem_lineitem is:

25.27% postgres postgres           [.] float4_accum
21.86% postgres postgres           [.] float8_accum
   5.49% postgres postgres           [.] check_float8_array
   4.57% postgres postgres           [.] ExecScan
   2.61% postgres postgres           [.] AggCheckCallContext
   2.30% postgres postgres           [.] pg_detoast_datum
   2.10% postgres postgres           [.] inmem_getnextslot
   1.81% postgres postgres           [.] SeqNext
   1.73% postgres postgres           [.] fetch_input_tuple
   1.61% postgres postgres           [.] ExecAgg
   1.23% postgres postgres           [.] MemoryContextReset

But still more than 3 times difference with VOPS!
Something is wrong here...

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Re: Why JIT speed improvement is so modest? at 2019-11-25 15:24:29 from Merlin Moncure

Responses

Re: Why JIT speed improvement is so modest? at 2019-11-27 16:05:04 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2019-11-27 16:01:49	Re: allow_system_table_mods stuff
Previous Message	Maxence Ahlouche	2019-11-27 15:30:12	Re: Invisible PROMPT2