Re: Add ZSON extension to /contrib/

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Joel Jacobson <joel(at)compiler(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Subject: Re: Add ZSON extension to /contrib/
Date: 2021-07-10 18:47:49
Message-ID: d524cc65-6d45-2ffd-fc9b-eb60b964d02e@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/3/21 12:34 PM, Peter Eisentraut wrote:
> On 04.06.21 17:09, Aleksander Alekseev wrote:
>> I decided to add the patch to the nearest commitfest.
>
> With respect to the commit fest submission, I don't think there is
> consensus right now to add this.  I think people would prefer that this
> dictionary facility be somehow made available in the existing JSON
> types.  Also, I sense that there is still some volatility about some of
> the details of how this extension should work and its scope.  I think
> this is served best as an external extension for now.

I agree there's a lot of open questions to figure out, but I think this
"column-level compression" capability has a lot of potential. Not just
for structured documents like JSON, but maybe even for scalar types.

I don't think the question whether this should be built into jsonb, a
separate built-in type, contrib type or something external is the one we
need to answer first.

The first thing I'd like to see is some "proof" that it's actually
useful in practice - there were some claims about people/customers using
it and being happy with the benefits, but there were no actual examples
of data sets that are expected to benefit, compression ratios etc. And
considering that [1] went unnoticed for 5 years, I have my doubts about
it being used very widely. (I may be wrong and maybe people are just not
casting jsonb to zson.)

I've tried to use this on the one large non-synthetic JSONB dataset I
had at hand at the moment, which is the bitcoin blockchain. That's ~1TB
with JSONB, and when I tried using ZSON instead there was no measurable
benefit, in fact the database was a bit larger. But I admit btc data is
rather strange, because it contains a lot of randomness (all the tx and
block IDs are random-looking hashes, etc.), and there's a lot of them in
each document. So maybe that's simply a data set that can't benefit from
zson on principle.

I also suspect the zson_extract_strings() is pretty inefficient and I
ran into various issues with the btc blocks which have very many keys,
often far more than the 10k limit.

In any case, I think having a clear example(s) of practical data sets
that benefit from using zson would be very useful, both to guide the
development and to show what the potential gains are.

The other thing is that a lot of the stuff seems to be manual (e.g. the
learning), and not really well integrated with the core. IMO improving
this by implementing the necessary infrastructure would help all the
possible cases (built-in type, contrib, external extension).

regards

[1]
https://github.com/postgrespro/zson/commit/02db084ea3b94d9e68fd912dea97094634fcdea5

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2021-07-10 20:45:17 Re: Support kerberos authentication for postgres_fdw
Previous Message Dean Rasheed 2021-07-10 18:19:12 Re: pgsql: Fix numeric_mul() overflow due to too many digits after decimal