Re: Zedstore - compressed in-core columnar storage

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Jacob Champion <pchampion(at)vmware(dot)com>
Cc: Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com>, "Alexandra Wang (Pivotal)" <lewang(at)pivotal(dot)io>, "Taylor Vesely (Pivotal)" <tvesely(at)pivotal(dot)io>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, "Ashwin Agrawal (Pivotal)" <aagrawal(at)pivotal(dot)io>, DEV_OPS <devops(at)ww-it(dot)cn>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Zedstore - compressed in-core columnar storage
Date: 2020-11-13 22:00:56
Message-ID: ee13744d-702c-5891-4a49-9f1c5d29c6a7@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/13/20 8:07 PM, Jacob Champion wrote:
> On Nov 12, 2020, at 2:40 PM, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>
>> Hi,
>>
>> Thanks for the updated patch. It's a quite massive amount of code - I I
>> don't think we had many 2MB patches in the past, so this is by no means
>> a full review.
>
> Thanks for taking a look! You're not kidding about the patch size.
>
> FYI, the tableam changes made recently have been extracted into their
> own patch, which is up at [1].
>
>> 1) the psql_1.out is missing a bit of expected output (due to 098fb0079)
>
> Yeah, this patch was rebased as of efc5dcfd8a.
>
>> 2) I'm getting crashes in intarray contrib, due to hitting this error in
>> lwlock.c (backtrace attached):
>>
>> /* Ensure we will have room to remember the lock */
>> if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS)
>> elog(ERROR, "too many LWLocks taken");
>>
>> I haven't investigates this too much, but it's regular build with
>> asserts and TAP tests, so it should be simple to reproduce using "make
>> check-world" I guess.
>
> I've only seen this intermittently in installcheck, and I'm not able to
> reproduce with the intarray tests on my machine (macOS). Definitely
> something we need to look into. What OS are you testing on?
>

Fedora 32, nothing special. I'm not sure if I ran the tests with pglz or
lz4, maybe there's some dependence on that, but it does fail for me
quite reliably with this:

./configure --enable-debug --enable-cassert --enable-tap-tests
--with-lz4 && make -s clean && make -s -j4 && make check-world

>> It's mostly expected lz4 beats pglz in performance and compression
>> ratio, but this seems a bit too extreme I guess. Per past benchmarks
>> (e.g. [1] and [2]) the difference in compression/decompression time
>> should be maybe 1-2x or something like that, not 35x like here.
>
> Yeah, something seems off about that. We'll take a look.
>
>> BTW the comments in general need updating and tidying up, to make
>> reviews easier. For example the merge_attstream comment references
>> attstream1 and attstream2, but those are not the current parameters of
>> the function.
>
> Agreed.
>
>> 5) IHMO there should be a #define specifying the maximum number of items
>> per chunk (60). Currently there are literal constants used in various
>> places, sometimes 60, sometimes 59 etc. which makes it harder to
>> understand the code. FWIW 60 seems a bit low, but maybe it's OK.
>
> Yeah, that seems like a good idea.
>
> I think the value 60 comes from the use of simple-8b encoding -- see the
> comment at the top of zedstore_attstream.c.
>

Yeah, I understand where it comes from. I'm just saying that when you
see 59 hardcoded, it may not be obvious where it came from, and
something like ITEMS_PER_CHUNK would be better.

I wonder how complicated would it be to allow larger chunks, e.g. by
using one bit to say "there's another 64-bit codeword". Not sure if it's
worth the extra complexity, though - it's just that 60 feels a bit low.

>> 6) I do think ZSAttStream should track which compression is used by the
>> stream, for two main reasons. Firstly, there's another patch to support
>> "custom compression" methods, which (also) allows multiple compression
>> methods per column. It'd be a bit strange to support that for varlena
>> columns in heap table, and not here, I guess. Secondly, I think one of
>> the interesting columnstore features down the road will be execution on
>> compressed data, which however requires compression method designed for
>> that purpose, and it's often datatype-specific (delta encoding, ...).
>>
>> I don't think we need to go as far as supporting "custom" compression
>> methods here, but I think we should allow different built-in compression
>> methods for different attstreams.
>
> Interesting. We'll need to read/grok that ML thread.
>

That thread is a bit long not sure it's worth reading as a whole unless
you want to work on that feature. The gist is that to seamlessly support
multiple compression algorithms we need to store an ID of the algorithm
somewhere. For TOAST that's not too difficult, we can do that in the
TOAST pointer - the the main challenge is in doing it in a
backwards-compatible way. For zedstore we can actually design it from
the start.

I wonder if we should track version of the format somewhere, to allow
future improvements. So that if/when we decide to change something in
the future, we don't have to scavenge bits etc. Or perhaps just a
"uint32 flags" field, unused/reserved for future use.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-11-14 00:37:53 Re: PATCH: Batch/pipelining support for libpq
Previous Message Tom Lane 2020-11-13 21:02:19 Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction