Re: Add ZSON extension to /contrib/

From: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add ZSON extension to /contrib/
Date: 2021-05-26 15:11:52
Message-ID: 4c6433d3-1e28-1a44-6dd5-2d1a8d424b24@garret.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 25.05.2021 13:55, Aleksander Alekseev wrote:
> Hi hackers,
>
> Back in 2016 while being at PostgresPro I developed the ZSON extension
> [1]. The extension introduces the new ZSON type, which is 100%
> compatible with JSONB but uses a shared dictionary of strings most
> frequently used in given JSONB documents for compression. These
> strings are replaced with integer IDs. Afterward, PGLZ (and now LZ4)
> applies if the document is large enough by common PostgreSQL logic.
> Under certain conditions (many large documents), this saves disk
> space, memory and increases the overall performance. More details can
> be found in README on GitHub.
>
> The extension was accepted warmly and instantaneously I got several
> requests to submit it to /contrib/ so people using Amazon RDS and
> similar services could enjoy it too. Back then I was not sure if the
> extension is mature enough and if it lacks any additional features
> required to solve the real-world problems of the users. Time showed,
> however, that people are happy with the extension as it is. There were
> several minor issues discovered, but they were fixed back in 2017. The
> extension never experienced any compatibility problems with the next
> major release of PostgreSQL.
>
> So my question is if the community may consider adding ZSON to
> /contrib/. If this is the case I will add this thread to the nearest
> CF and submit a corresponding patch.
>
> [1]: https://github.com/postgrespro/zson
>
> --
> Best regards,
> Aleksander Alekseev
> Open-Source PostgreSQL Contributor at Timescale

Yet another approach to the same problem:

https://github.com/postgrespro/jsonb_schema

Instead of compression JSONs we can try to automatically detect JSON
schema (names and types of JSON fields) and store it separately from values.
This approach is more similar with one used in schema-less databases. It
is most efficient if there are many JSON records with the same schema
and sizes of  keys are comparable with size of values. At IMDB data set
it cause reducing of database size about 1.7 times.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-05-26 15:13:46 Re: Move pg_attribute.attcompression to earlier in struct for reduced size?
Previous Message Robert Haas 2021-05-26 15:11:26 Re: Replacing pg_depend PIN entries with a fixed range check