Re: VOPS: vectorized executor for Postgres: how to speedup OLAP queries more than 10 times without changing anything in Postgres executor

From: Thom Brown <thom(at)linux(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: VOPS: vectorized executor for Postgres: how to speedup OLAP queries more than 10 times without changing anything in Postgres executor
Date: 2017-02-16 17:42:40
Message-ID: CAA-aLv5k0y6NzzD6j+M-jW=WbW7xbFCfXGB1DK1zdZ2-E_+iAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 16 February 2017 at 17:00, Konstantin Knizhnik
<k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> More progress in vectorized Postgres extension (VOPS). It is not required
> any more to use some special functions in queries.
> You can use vector operators in query with standard SQL and still get ten
> times improvement on some queries.
> VOPS extension now uses post parse analyze hook to transform query.
> I really impressed by flexibility and extensibility of Postgres type system.
> User defined types&operatpors&casts do most of the work.
>
> It is still responsibility of programmer or database administrator to create
> proper projections
> of original table. This projections need to use tiles types for some
> attributes (vops_float4,...).
> Then you can query this table using standard SQL. And this query will be
> executed using vector operations!
>
> Example of such TPC-H queries:
>
> Q1:
> select
> l_returnflag,
> l_linestatus,
> sum(l_quantity) as sum_qty,
> sum(l_extendedprice) as sum_base_price,
> sum(l_extendedprice*(1-l_discount)) as sum_disc_price,
> sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge,
> avg(l_quantity) as avg_qty,
> avg(l_extendedprice) as avg_price,
> avg(l_discount) as avg_disc,
> count(*) as count_order
> from
> vops_lineitem_projection
> where
> l_shipdate <= '1998-12-01'::date
> group by
> l_returnflag,
> l_linestatus
> order by
> l_returnflag,
> l_linestatus;
>
>
>
> Q6:
> select
> sum(l_extendedprice*l_discount) as revenue
> from
> lineitem_projection
> where
> l_shipdate between '1996-01-01'::date and '1997-01-01'::date
> and l_discount between 0.08 and 0.1
> and l_quantity < 24;

>
> On 13.02.2017 17:12, Konstantin Knizhnik wrote:
>>
>> Hello hackers,
>>
>> There were many discussions concerning possible ways of speeding-up
>> Postgres. Different approaches were suggested:
>>
>> - JIT (now we have three different prototype implementations based on
>> LLVM)
>> - Chunked (vectorized) executor
>> - Replacing pull with push
>> - Columnar store (cstore_fdw, IMCS)
>> - Optimizing and improving current executor (reducing tuple deform
>> overhead, function call overhead,...)
>>
>> Obviously the best result can be achieved in case of combining all this
>> approaches. But actually them are more or less interchangeable: vectorized
>> execution is not eliminating interpretation overhead, but it is divided by
>> vector size and becomes less critical.
>>
>> I decided to write small prototype to estimate possible speed improvement
>> of vectorized executor. I created special types representing "tile" and
>> implement standard SQL operators for them. So neither Postgres planer,
>> nether Postgres executor, nether Postgres heap manager are changed. But I
>> was able to reach more than 10 times speed improvement on TPC-H Q1/Q6
>> queries!

Impressive work!

Thom

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2017-02-16 17:45:57 Re: duplicate "median" entry in doc
Previous Message Robert Haas 2017-02-16 17:30:42 Re: Parallel tuplesort (for parallel B-Tree index creation)