Re: Yet another vectorized engine

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Hubert Zhang <hzhang(at)pivotal(dot)io>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Gang Xiong <gxiong(at)pivotal(dot)io>, Asim R P <apraveen(at)pivotal(dot)io>, Ning Yu <nyu(at)pivotal(dot)io>
Subject: Re: Yet another vectorized engine
Date: 2020-02-25 10:44:25
Message-ID: cc22e8c5-98d9-1337-73d9-8ad70bc8cbc5@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 25.02.2020 11:06, Hubert Zhang wrote:
> Hi Konstantin,
>
> I checkout your branch pg13 in repo
> https://github.com/zhangh43/vectorize_engine
> After I fixed some compile error, I tested Q1 on TPCH-10G
> The result is different from yours and vectorize version is too slow.
> Note that I disable parallel worker by default.
> no JIT no Vectorize:  36 secs
> with JIT only:             23 secs
> with Vectorize only:   33 secs
> JIT + Vectorize:         29 secs
>
> My config option is `CFLAGS='-O3 -g -march=native'
> --prefix=/usr/local/pgsql/ --disable-cassert --enable-debug --with-llvm`
> I will do some spike on why vectorized is so slow. Could you please
> provide your compile option and the TPCH dataset size and your
> queries(standard Q1?) to help me to debug on it.
>

Hi, Hubert

Sorry, looks like I have used slightly deteriorated snapshot of master
so I have not noticed some problems.
Fixes are committed.

Most of the time is spent in unpacking heap tuple
(tts_buffer_heap_getsomeattrs):

  24.66%  postgres  postgres             [.] tts_buffer_heap_getsomeattrs
   8.28%  postgres  vectorize_engine.so  [.] VExecStoreColumns
   5.94%  postgres  postgres             [.] HeapTupleSatisfiesVisibility
   4.21%  postgres  postgres             [.] bpchareq
   4.12%  postgres  vectorize_engine.so  [.] vfloat8_accum

In my version of nodeSeqscan I do not keep all fetched 1024 heap tuples
but stored there attribute values in vector columns immediately.
But to avoid extraction of useless data it is necessary to know list of
used columns.
The same problem is solved in zedstore, but unfortunately there is no
existed method in Postgres to get list
of used attributes. I have done it but my last implementation contains
error which cause loading of all columns.
Fixed version is committed.

Now profile without JIT is:

 15.52%  postgres  postgres             [.] tts_buffer_heap_getsomeattrs
  10.25%  postgres  postgres             [.] ExecInterpExpr
   6.54%  postgres  postgres             [.] HeapTupleSatisfiesVisibility
   5.12%  postgres  vectorize_engine.so  [.] VExecStoreColumns
   4.86%  postgres  postgres             [.] bpchareq
   4.80%  postgres  vectorize_engine.so  [.] vfloat8_accum
   3.78%  postgres  postgres             [.] tts_minimal_getsomeattrs
   3.66%  postgres  vectorize_engine.so  [.] VExecAgg
   3.38%  postgres  postgres             [.] hashbpchar

and with JIT:

 13.88%  postgres  postgres             [.] tts_buffer_heap_getsomeattrs
   7.15%  postgres  vectorize_engine.so  [.] vfloat8_accum
   6.03%  postgres  postgres             [.] HeapTupleSatisfiesVisibility
   5.55%  postgres  postgres             [.] bpchareq
   4.42%  postgres  vectorize_engine.so  [.] VExecStoreColumns
   4.19%  postgres  postgres             [.] hashbpchar
   4.09%  postgres  vectorize_engine.so  [.] vfloat8pl

> On Mon, Feb 24, 2020 at 8:43 PM Hubert Zhang <hzhang(at)pivotal(dot)io
> <mailto:hzhang(at)pivotal(dot)io>> wrote:
>
> Hi Konstantin,
> I have added you as a collaborator on github. Please accepted and
> try again.
> I think non collaborator could also open pull requests.
>
> On Mon, Feb 24, 2020 at 8:02 PM Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru <mailto:k(dot)knizhnik(at)postgrespro(dot)ru>> wrote:
>
>
>
> On 24.02.2020 05:08, Hubert Zhang wrote:
>> Hi
>>
>> On Sat, Feb 22, 2020 at 12:58 AM Konstantin Knizhnik
>> <k(dot)knizhnik(at)postgrespro(dot)ru
>> <mailto:k(dot)knizhnik(at)postgrespro(dot)ru>> wrote:
>>
>>
>>
>> On 12.02.2020 13:12, Hubert Zhang wrote:
>>> On Tue, Feb 11, 2020 at 1:20 AM Konstantin Knizhnik
>>> <k(dot)knizhnik(at)postgrespro(dot)ru
>>> <mailto:k(dot)knizhnik(at)postgrespro(dot)ru>> wrote:
>>>
>>>
>>> So looks like PG-13 provides significant advantages
>>> in OLAP queries comparing with 9.6!
>>> Definitely it doesn't mean that vectorized executor
>>> is not needed for new version of Postgres.
>>> Once been ported, I expect that it should provide
>>> comparable improvement of performance.
>>>
>>> But in any case I think that vectorized executor
>>> makes sense only been combine with columnar store.
>>>
>>>
>>> Thanks for the test. +1 on vectorize should be combine
>>> with columnar store. I think when we support this extension
>>> on master, we could try the new zedstore.
>>> I'm not active on this work now, but will continue when
>>> I have time. Feel free to join bring vops's feature into
>>> this extension.
>>> Thanks
>>>
>>> Hubert Zhang
>>
>> I have ported vectorize_engine to the master.
>> It takes longer than I expected: a lot of things were
>> changed in executor.
>>
>> Results are the following:
>>
>>
>> par.warkers
>> PG9_6
>> vectorize=off
>> PG9_6
>> vectorize=on
>> master
>> vectorize=off
>> jit=on
>> master
>> vectorize=off
>> jit=off master
>> vectorize=on
>> jit=ofn master
>> vectorize=on
>> jit=off
>> 0
>> 36
>> 20
>> 16
>> 25.5
>> 15
>> 17.5
>> 4
>> 10
>> -
>> 5 7
>> -
>> -
>>
>>
>> So it proves the theory that JIT provides almost the same
>> speedup as vector executor (both eliminates
>> interpretation overhead but in different way).
>> I still not sure that we need vectorized executor:
>> because with standard heap it provides almost no
>> improvements comparing with current JIT version.
>> But in any case I am going to test it with vertical
>> storage (zedstore or cstore).
>>
>>
>> Thanks for the porting and testing.
>> Yes, PG master and 9.6 have many changes, not only executor,
>> but also tupletableslot interface.
>>
>> What matters the performance of JIT and Vectorization is its
>> implementation. This is just the beginning of vectorization
>> work, just as your vops extension reported, vectorization
>> could run 10 times faster in PG. With the overhead of row
>> storage(heap), we may not reach that speedup, but I think we
>> could do better. Also +1 on vertical storage.
>>
>> BTW, welcome to submit your PR for the PG master version.
>
>
> Sorry, but I have no permissions to push changes to your
> repository.
> I can certainly create my own fork of vectorize_engine, but I
> think it will be beter if I push pg13 branch in your repository.
>
>
>
>
> --
> Thanks
>
> Hubert Zhang
>
>
>
> --
> Thanks
>
> Hubert Zhang

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2020-02-25 10:48:10 Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line
Previous Message Robert Haas 2020-02-25 10:34:14 Re: SPI Concurrency Precautions? Problems with Parallel Execution of Multiple CREATE TABLE statements