From: | Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Ashwin Agrawal <aagrawal(at)pivotal(dot)io>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Zedstore - compressed in-core columnar storage |
Date: | 2019-04-09 15:45:26 |
Message-ID: | 3978b57e-fe25-ca6b-f56c-48084417e115@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 09.04.2019 18:08, Heikki Linnakangas wrote:
> On 09/04/2019 18:00, Konstantin Knizhnik wrote:
>> On 09.04.2019 17:09, Konstantin Knizhnik wrote:
>>> standard Postgres heap and my VOPS extension.
>>> As test data I used TPC-H benchmark (actually only one lineitem table
>>> generated with tpch-dbgen utility with scale factor 10 (~8Gb database).
>>> I attached script which I have use to populate data (you have to to
>>> download, build and run tpch-dbgen yourself, also you can comment code
>>> related with VOPS).
>
> Cool, thanks!
>
>>> Unfortunately I failed to load data in zedstore:
>>>
>>> postgres=# insert into zedstore_lineitem_projection (select
>>> l_shipdate,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag::"char",l_linestatus::"char"
>>>
>>> from lineitem);
>>> psql: ERROR: compression failed. what now?
>>> Time: 237804.775 ms (03:57.805)
>
> Yeah, it's still early days, it will crash and burn in a lot of cases.
> We wanted to publish this early, to gather ideas and comments on the
> high level design, and to validate that the table AM API that's in v12
> is usable.
>
>> Looks like the original problem was caused by internal postgres
>> compressor: I have not configured Postgres to use lz4.
>> When I configured Postgres --with-lz4, data was correctly inserted in
>> zedstore table, but looks it is not compressed at all:
>>
>> postgres=# select pg_relation_size('zedstore_lineitem_projection');
>> pg_relation_size
>> ------------------
>> 9363010640
>
> The single-insert codepath isn't very optimized yet. If you populate
> the table with large "INSERT ... SELECT ...", you end up with a huge
> undo log. Try loading it with COPY.
>
> You can also see how many pages of each type there is with:
>
> select count(*), pg_zs_page_type('zedstore_lineitem_projection', g)
> from generate_series(0,
> pg_table_size('zedstore_lineitem_projection') / 8192 - 1) g group by 2;
>
> - Heikki
postgres=# copy zedstore_lineitem from '/mnt/data/lineitem.tbl'
delimiter '|' csv;
COPY 59986052
Time: 232802.257 ms (03:52.802)
postgres=# select pg_relation_size('zedstore_lineitem');
pg_relation_size
------------------
10346504192
(1 row)
postgres=# select count(*), pg_zs_page_type('zedstore_lineitem', g)
from generate_series(0, pg_table_size('zedstore_lineitem') / 8192 -
1) g group by 2;
count | pg_zs_page_type
---------+-----------------
1 | META
1262308 | BTREE
692 | UNDO
(3 rows)
And now performance is much worser:
Time: 99819.476 ms (01:39.819)
It is strange, because the main advantage of columnar store is that it
has to fetch only accessed rows.
What I see is that in non-parallel mode (max_parallel_workers_per_gather
= 0)
backend consumes about 11GB of memory. It fits in my desktop RAM (16GB)
and speed is ~58 seconds.
But one I start 4 parallel workers, them cause huge swapping:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
28195 knizhnik 20 0 11.823g 6.553g 5.072g D 7.6 42.2 0:17.19
postgres
28074 knizhnik 20 0 11.848g 6.726g 5.223g D 7.3 43.3 4:14.96
postgres
28192 knizhnik 20 0 11.854g 6.586g 5.075g D 7.3 42.4 0:17.18
postgres
28193 knizhnik 20 0 11.870g 6.594g 5.064g D 7.3 42.4 0:17.19
postgres
28194 knizhnik 20 0 11.854g 6.589g 5.078g D 7.3 42.4 0:17.09
postgres
which is also strange because data should be present in shared buffers.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2019-04-09 15:48:38 | block-level incremental backup |
Previous Message | Tom Lane | 2019-04-09 15:45:20 | Re: more isolation tests for update tuple routing |