Re: ZSON, PostgreSQL extension for compressing JSONB

From: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>
To: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: ZSON, PostgreSQL extension for compressing JSONB
Date: 2016-10-06 10:03:54
Message-ID: 20161006094859.GA22564@e733.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello, Eduardo.

> Why do you use a dictionary compression and not zlib/lz4/bzip/anyother?

Internally PostgreSQL already has LZ77 family algorithm - PGLZ. I didn't
try to replace it, only to supplement. PGLZ compresses every piece of
data (JSONB documents in this case) independently. What I did is removed
redundant data that exists between documents and that PGLZ can't
compress since every single document usually uses every key and similar
strings (some sort of string tags in arrays, etc) only once.

> Compress/Decompress speed?

By my observations PGLZ has characteristics similar to GZIP. I didn't
benchmark ZSON encoding/decoding separately from DBMS because end
user is interested only in TPS which depends on IO, amount of documents
that we could fit into memory and other factors.

> As I understand, postgresql must decompress before use.

Only if you try to read document fields. For deleting a tuple, doing
vacuum, etc there is no need to decompress a data.

> Some compressing algs (dictionary transforms where a token is word)
> allow search for tokens/words directly on compressed data transforming
> the token/word to search in dictionary entry and searching it in
> compressed data. From it, replace, substring, etc... string
> manipulations algs at word level can be implemented.

Unfortunately I doubt that current ZSON implementation can use these
ideas. However I must agree that it's a very interesting field of
research. I don't think anyone tried to do something like this in
PostgreSQL yet.

> My passion is compression, do you care if I try other algorithms? For
> that, some dict id numbers (>1024 or >1<<16 or <128 for example) say
> which compression algorithm is used or must change zson_header to store
> that information. Doing that, each document could be compressed with
> the best compressor (size or decompression speed) at idle times or at
> request.

By all means! Naturally if you'll find a better encoding I would be happy
to merge corresponding code in ZSON's repository.

> Thanks for sharing and time.

Thanks for feedback and sharing your thoughts!

--
Best regards,
Aleksander Alekseev

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Francisco Olarte 2016-10-06 11:06:15 Re: Transactional-DDL DROP/CREATE TABLE
Previous Message Geoff Winkless 2016-10-06 09:21:39 Transactional-DDL DROP/CREATE TABLE