Re: In-Memory Columnar Store

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: knizhnik <knizhnik(at)garret(dot)ru>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: In-Memory Columnar Store
Date: 2013-12-12 15:03:46
Message-ID: CAHyXU0wUtR6o4G1KTyCrxEnkHA96=wSNLcEDZHip4zgH2k0s9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 12, 2013 at 4:02 AM, knizhnik <knizhnik(at)garret(dot)ru> wrote:
> On 12/12/2013 11:42 AM, Pavel Stehule wrote:
>
> it is interesting idea. For me, a significant information from comparation,
> so we do some significantly wrong. Memory engine should be faster naturally,
> but I don't tkink it can be 1000x.
>
>
> Sorry, but I didn't fabricate this results:
> Below is just snapshot from my computer:
>
>
> postgres=# select DbItem_load();
> dbitem_load
> -------------
> 9999998
> (1 row)
>
> postgres=# \timing
> Timing is on.
> postgres=# select cs_used_memory();
> cs_used_memory
> ----------------
> 4441894912
> (1 row)
>
> postgres=# select agg_val,cs_cut(group_by,'c22c30c10') from
> (select (cs_project_agg(ss1.*)).* from
> (select (s1).sum/(s2).sum,(s1).groups from DbItem_get() q,
> cs_hash_sum(q.score*q.volenquired,
> q.trader||q.desk||q.office) s1,
> cs_hash_sum(q.volenquired, q.trader||q.desk||q.office) s2)
> ss1) ss2;
> agg_val | cs_cut
> ------------------+------------------------------------------------------------
> 1.50028393511844 | ("John Coltrane","New York Corporates","New York")
> ....
> Time: 506.125 ms
>
> postgres=# select sum(score*volenquired)/sum(volenquired) from DbItem group
> by (trader,desk,office);
> ...
> Time: 449328.645 ms
> postgres=# select sum(score*volenquired)/sum(volenquired) from DbItem group
> by (trader,desk,office);
> ...
> Time: 441530.689 ms
>
> Please notice that time of second execution is almost the same as first,
> although all data can fit in cache!
>
> Certainly it was intersting to me to understand the reason of such bad
> performance.
> And find out two things:
>
> 1.
> select sum(score*volenquired)/sum(volenquired) from DbItem group by
> (trader,desk,office);
> and
> select sum(score*volenquired)/sum(volenquired) from DbItem group by
> trader,desk,office;
>
> are not the same queries (it is hard to understand to C programmer:)
> And first one is executed significantly slower.
>
> 2. It is not enough to increase "shared_buffers" parameter in
> postgresql.conf.
> "work_mem" is also very important. When I increased it to 1Gb from default
> 1Mb, then time of query execution is reduced to
> 7107.146 ms. So the real difference is ten times, not 1000 times.

Yeah. It's not fair to compare vs an implementation that is
constrained to use only 1mb. For analytics work huge work mem is
pretty typical setting. 10x improvement is believable considering
you've removed all MVCC overhead, locking, buffer management, etc. and
have a simplified data structure.

merlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-12-12 15:07:56 Re: should we add a XLogRecPtr/LSN SQL type?
Previous Message Robert Haas 2013-12-12 15:03:11 Re: Time-Delayed Standbys