Re: Yet another vectorized engine

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Hubert Zhang <hzhang(at)pivotal(dot)io>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Gang Xiong <gxiong(at)pivotal(dot)io>, Asim R P <apraveen(at)pivotal(dot)io>, Ning Yu <nyu(at)pivotal(dot)io>
Subject: Re: Yet another vectorized engine
Date: 2020-02-26 11:59:27
Message-ID: 302c6f54-668a-dd38-f8ba-29f0cd667e02@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 26.02.2020 13:11, Hubert Zhang wrote:
>
>
> and with JIT:
>
>  13.88%  postgres  postgres             [.]
> tts_buffer_heap_getsomeattrs
>    7.15%  postgres  vectorize_engine.so  [.] vfloat8_accum
>    6.03%  postgres  postgres             [.]
> HeapTupleSatisfiesVisibility
>    5.55%  postgres  postgres             [.] bpchareq
>    4.42%  postgres  vectorize_engine.so  [.] VExecStoreColumns
>    4.19%  postgres  postgres             [.] hashbpchar
>    4.09%  postgres  vectorize_engine.so  [.] vfloat8pl
>
>
> I also tested Q1 with your latest code. Result of vectorized is still
> slow.
> PG13 native: 38 secs
> PG13 Vec: 30 secs
> PG13 JIT: 23 secs
> PG13 JIT+Vec: 27 secs
>

It is strange that your results are much slower than my and profile is
very different.
Which postgres configuration you are using?

> My perf result is as belows. There are three parts:
> 1. lookup_hash_entry(43.5%) this part is not vectorized yet.
It is vectorized in some sense: lookup_hash_entry performs bulk of hash
lookups and pass array with results of such lookups to aggregate
transmit functions.
It will be possible to significantly increase speed of HashAgg if we
store data in order of grouping attributes and use RLE (run length
encoding) to peform just one
hash lookup for group of values. But it requires creation of special
partitions (like it is done in Vertica and VOPS).

> 2. scan part: fetch_input_tuple(36%)
> 3. vadvance_aggregates part(20%)
> I also perfed on PG96 vectorized version and got similar perf results
> and running time of vectorized PG96 and PG13 are also similar. But
> PG13 is much faster than PG96. So I just wonder whether we merge all
> the latest executor code of PG13 into the vectorized PG13 branch?

Sorry, I do not understand the question. vectorize_executor contains
patched versions of nodeSeqscan  and nodeAgg from standard executor.
When performing porting to PG13, I took the latest version of nodeAgg
and tried to apply your patches to it. Certainly not always it was
possible and I have to rewrite a lt of places. Concerning nodeSeqscan -
I took old version from vectorize_executor and port it to PG13.

>
> - agg_fill_hash_table ◆ - 43.50% lookup_hash_entry (inlined) ▒ +
> 39.07% LookupTupleHashEntry ▒ 0.56% ExecClearTuple (inlined) ▒ -
> 36.06% fetch_input_tuple ▒ - ExecProcNode (inlined) ▒ - 36.03%
> VExecScan ▒ - 34.60% ExecScanFetch (inlined) ▒ - ExecScanFetch
> (inlined) ▒ - VSeqNext ▒ + 16.64% table_scan_getnextslot (inlined) ▒ -
> 10.29% slot_getsomeattrs (inlined) ▒ - 10.17% slot_getsomeattrs_int ▒
> + tts_buffer_heap_getsomeattrs ▒ 7.14% VExecStoreColumns ▒ + 1.38%
> ExecQual (inlined) ▒ - 20.30% Vadvance_aggregates (inlined) ▒ - 17.46%
> Vadvance_transition_function (inlined) ▒ + 11.95% vfloat8_accum ▒ +
> 4.74% vfloat8pl ▒ 0.75% vint8inc_any ▒ + 2.77% ExecProject (inlined)
>

It is strange that I am not seeing lookup_hash_entry in profile in my case.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-02-26 12:19:02 Re: Wait event that should be reported while waiting for WAL archiving to finish
Previous Message Andrew Dunstan 2020-02-26 11:15:58 Re: Resolving the python 2 -> python 3 mess