Re: Yet another vectorized engine

From: Hubert Zhang <hzhang(at)pivotal(dot)io>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Gang Xiong <gxiong(at)pivotal(dot)io>, Asim R P <apraveen(at)pivotal(dot)io>, Ning Yu <nyu(at)pivotal(dot)io>
Subject: Re: Yet another vectorized engine
Date: 2020-02-26 10:11:26
Message-ID: CAB0yre=yjWj-=X-aL6cYc0icN4Md9VKx=244w19Sn4RZ1O4sCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Konstantin,

On Tue, Feb 25, 2020 at 6:44 PM Konstantin Knizhnik <
k(dot)knizhnik(at)postgrespro(dot)ru> wrote:

>
>
> On 25.02.2020 11:06, Hubert Zhang wrote:
>
> Hi Konstantin,
>
> I checkout your branch pg13 in repo
> https://github.com/zhangh43/vectorize_engine
> After I fixed some compile error, I tested Q1 on TPCH-10G
> The result is different from yours and vectorize version is too slow. Note
> that I disable parallel worker by default.
> no JIT no Vectorize: 36 secs
> with JIT only: 23 secs
> with Vectorize only: 33 secs
> JIT + Vectorize: 29 secs
>
> My config option is `CFLAGS='-O3 -g -march=native'
> --prefix=/usr/local/pgsql/ --disable-cassert --enable-debug --with-llvm`
> I will do some spike on why vectorized is so slow. Could you please
> provide your compile option and the TPCH dataset size and your
> queries(standard Q1?) to help me to debug on it.
>
>
>
> Hi, Hubert
>
> Sorry, looks like I have used slightly deteriorated snapshot of master so
> I have not noticed some problems.
> Fixes are committed.
>
> Most of the time is spent in unpacking heap tuple
> (tts_buffer_heap_getsomeattrs):
>
> 24.66% postgres postgres [.] tts_buffer_heap_getsomeattrs
> 8.28% postgres vectorize_engine.so [.] VExecStoreColumns
> 5.94% postgres postgres [.] HeapTupleSatisfiesVisibility
> 4.21% postgres postgres [.] bpchareq
> 4.12% postgres vectorize_engine.so [.] vfloat8_accum
>
>
> In my version of nodeSeqscan I do not keep all fetched 1024 heap tuples
> but stored there attribute values in vector columns immediately.
> But to avoid extraction of useless data it is necessary to know list of
> used columns.
> The same problem is solved in zedstore, but unfortunately there is no
> existed method in Postgres to get list
> of used attributes. I have done it but my last implementation contains
> error which cause loading of all columns.
> Fixed version is committed.
>
> Now profile without JIT is:
>
> 15.52% postgres postgres [.] tts_buffer_heap_getsomeattrs
> 10.25% postgres postgres [.] ExecInterpExpr
> 6.54% postgres postgres [.] HeapTupleSatisfiesVisibility
> 5.12% postgres vectorize_engine.so [.] VExecStoreColumns
> 4.86% postgres postgres [.] bpchareq
> 4.80% postgres vectorize_engine.so [.] vfloat8_accum
> 3.78% postgres postgres [.] tts_minimal_getsomeattrs
> 3.66% postgres vectorize_engine.so [.] VExecAgg
> 3.38% postgres postgres [.] hashbpchar
>
> and with JIT:
>
> 13.88% postgres postgres [.] tts_buffer_heap_getsomeattrs
> 7.15% postgres vectorize_engine.so [.] vfloat8_accum
> 6.03% postgres postgres [.] HeapTupleSatisfiesVisibility
> 5.55% postgres postgres [.] bpchareq
> 4.42% postgres vectorize_engine.so [.] VExecStoreColumns
> 4.19% postgres postgres [.] hashbpchar
> 4.09% postgres vectorize_engine.so [.] vfloat8pl
>
>
I also tested Q1 with your latest code. Result of vectorized is still slow.
PG13 native: 38 secs
PG13 Vec: 30 secs
PG13 JIT: 23 secs
PG13 JIT+Vec: 27 secs

My perf result is as belows. There are three parts:
1. lookup_hash_entry(43.5%) this part is not vectorized yet.
2. scan part: fetch_input_tuple(36%)
3. vadvance_aggregates part(20%)
I also perfed on PG96 vectorized version and got similar perf results and
running time of vectorized PG96 and PG13 are also similar. But PG13 is much
faster than PG96. So I just wonder whether we merge all the latest executor
code of PG13 into the vectorized PG13 branch?

- agg_fill_hash_table ◆ - 43.50% lookup_hash_entry (inlined) ▒ + 39.07%
LookupTupleHashEntry ▒ 0.56% ExecClearTuple (inlined) ▒ - 36.06%
fetch_input_tuple ▒ - ExecProcNode (inlined) ▒ - 36.03% VExecScan ▒ -
34.60% ExecScanFetch (inlined) ▒ - ExecScanFetch (inlined) ▒ - VSeqNext ▒ +
16.64% table_scan_getnextslot (inlined) ▒ - 10.29% slot_getsomeattrs
(inlined) ▒ - 10.17% slot_getsomeattrs_int ▒ + tts_buffer_heap_getsomeattrs
▒ 7.14% VExecStoreColumns ▒ + 1.38% ExecQual (inlined) ▒ - 20.30%
Vadvance_aggregates (inlined) ▒ - 17.46% Vadvance_transition_function
(inlined) ▒ + 11.95% vfloat8_accum ▒ + 4.74% vfloat8pl ▒ 0.75% vint8inc_any
▒ + 2.77% ExecProject (inlined)

--
Thanks

Hubert Zhang

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2020-02-26 10:38:57 Re: truncating timestamps on arbitrary intervals
Previous Message Konstantin Knizhnik 2020-02-26 09:51:35 Re: Yet another vectorized engine