Re: [WIP]Vertical Clustered Index (columnar store extension)

From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [WIP]Vertical Clustered Index (columnar store extension)
Date: 2017-02-14 01:59:43
Message-ID: CAJrrPGdjQ0MOS_LqjSu5=nw-Od1p2voMHhOrWYu8=SL5vAL6aA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 14, 2017 at 2:57 AM, Konstantin Knizhnik <
k(dot)knizhnik(at)postgrespro(dot)ru> wrote:

> Hi,
>
> I wonder if it is possible to somehow benchmark your clustered index
> implementation.
> I tried to create VCI index for lineitem table from TPC and run Q6 query.
> After index creation Postgres is not using parallel execution plan any
> more but speed of sequential plan is not changed
> and nothing in query execution plan indicates that VCI index is used:
>
>
> postgres=# explain select
> sum(l_extendedprice*l_discount) as revenue
> from
> lineitem_projection
> where
> l_shipdate between '1996-01-01' and '1997-01-01'
> and l_discount between 0.08 and 0.1
> and l_quantity < 24;
>
> QUERY
> PLAN
>
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> -------------------------------
> ------------------------------------------------------------
> -------------------------
> Finalize Aggregate (cost=608333.85..608333.86 rows=1 width=4)
> -> Gather (cost=608333.23..608333.84 rows=6 width=4)
> Workers Planned: 6
> -> Partial Aggregate (cost=607333.23..607333.24 rows=1 width=4)
> -> Parallel Seq Scan on lineitem_projection
> (cost=0.00..607024.83 rows=61680 width=8)
> Filter: ((l_shipdate >= '1996-01-01'::date) AND
> (l_shipdate <= '1997-01-01'::date) AND (l_discount >= '0.08'::double
> precision) AN
> D (l_discount <= '0.1'::double precision) AND (l_quantity < '24'::double
> precision))
> (6 rows)
>
> postgres=# select
> sum(l_extendedprice*l_discount) as revenue
> from
> lineitem_projection
> where
> l_shipdate between '1996-01-01' and '1997-01-01'
> and l_discount between 0.08 and 0.1
> and l_quantity < 24;
> revenue
> -------------
> 6.21111e+08
> (1 row)
>
> Time: 1171.324 ms (00:01.171)
>
> postgres=# create index vci_idx on lineitem_projection using
> vci(l_shipdate,l_quantity,l_extendedprice,l_discount,l_tax,
> l_returnflag,l_linestatus);
> CREATE INDEX
> Time: 4.705 ms
>
>
> postgres=# explain select
> * from
> lineitem_projection
> where
> l_shipdate between '1996-01-01' and '1997-01-01'
> and l_discount between 0.08 and 0.1
> and l_quantity < 24;
>
> QUERY
> PLAN
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> -------------------------------
> -------------------------------------------------------------------
> Seq Scan on lineitem_projection (cost=0.00..382077.00 rows=1 width=22)
> Filter: ((l_shipdate >= '1996-01-01'::date) AND (l_shipdate <=
> '1997-01-01'::date) AND (l_discount >= '0.08'::double precision) AND
> (l_discount <= '
> 0.1'::double precision) AND (l_quantity < '24'::double precision))
> (2 rows)
>
> postgres=# select
>
>
> sum(l_extendedprice*l_discount) as revenue
> from
> lineitem_projection
> where
> l_shipdate between '1996-01-01' and '1997-01-01'
> and l_discount between 0.08 and 0.1
> and l_quantity < 24;
> revenue
> ------------
> 6.2112e+08
> (1 row)
>
> Time: 4304.355 ms (00:04.304)
>
>
> I wonder if there is any query which can demonstrate advantages of using
> VCI index?
>

The current patch that I shared doesn't contains the plan and executor
changes to show
the performance benefit of the clustered index. we used custom plan to
generate the plan
for the clustered index. Currently I am working on it to rebase it to
current master and
other necessary changes.

In the current state of the patch, I cannot take any performance tests, as
it needs some
major changes according to the latest PostgreSQL version. I have an old
performance
report that is took on 9.5 attached for your reference.

The current patch that is shared is to find out the best approach in
developing a columnar
storage in PostgreSQL, by adopting Index access methods + additional hooks
or pluggable
storage access methods?

The only problem I can think of pluggable storage methods is, to use the
proper benefits of
columnar storage, the planner and executor needs to be changed to support
vector processing,
But whereas in the current model, we implemented the same with custom plan
and additional
hooks. The same may be possible with pluggable storage methods also.

Regards,
Hari Babu
Fujitsu Australia

Attachment Content-Type Size
VCI_DBT3_Query_Performance.xlsx application/vnd.openxmlformats-officedocument.spreadsheetml.sheet 32.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-02-14 02:04:41 Re: Parallel Index Scans
Previous Message Haribabu Kommi 2017-02-14 01:21:46 Re: Parallel bitmap heap scan