Re: Add ZSON extension to /contrib/

From: Aleksander Alekseev <aleksander(at)timescale(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: Add ZSON extension to /contrib/
Date: 2021-05-26 10:49:47
Message-ID: CAJ7c6TN0fEaMnxX9W1fU7Jau75trutx2+oR1cV+SssENRU4gzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

Many thanks for your feedback, I very much appreciate it!

> If the extension is mature enough, why make it an extension in
> contrib, and not instead either enhance the existing jsonb type with
> it or make it a built-in type?

> IMO we have too d*mn many JSON types already. If we can find a way
> to shoehorn this optimization into JSONB, that'd be great. Otherwise
> I do not think it's worth the added user confusion.

Magnus, Tom,

My reasoning is that if the problem can be solved with an extension
there is little reason to modify the core. This seems to be in the
spirit of PostgreSQL. If the community reaches the consensus to modify
the core to introduce a similar feature, we could discuss this as
well. It sounds like a lot of unnecessary work to me though (see
below).

> * doesn't cover all cases, notably indexes.

Tom,

Not sure if I follow. What cases do you have in mind?

> Do note that e.g. postgis is not in contrib, but is available in e.g. RDS.

Matthias,

Good point. I suspect that PostGIS is an exception though...

> I like the idea of the ZSON type, but I'm somewhat disappointed by its
> current limitations

Several people suggested various enhancements right after learning
about ZSON. Time showed, however, that none of the real-world users
really need e.g. more than one common dictionary per database. I
suspect this is because no one has more than 2**16 repeatable unique
strings (one dictionary limitation) in their documents. Thus there is
no benefit in having separate dictionaries and corresponding extra
complexity.

> - Each dictionary uses a lot of memory, regardless of the number of
> actual stored keys. For 32-bit systems the base usage of a dictionary
> without entries ((sizeof(Word) + sizeof(uint16)) * 2**16) would be
> almost 1MB, and for 64-bit it would be 1.7MB. That is significantly
> more than I'd want to install.

You are probably right on this one, this part could be optimized. I
will address this if we agree on submitting the patch.

> - You call gettimeofday() in both dict_get and in get_current_dict_id.
> These functions can be called in short and tight loops (for small GSON
> fields), in which case it would add significant overhead through the
> implied syscalls.

I must admit, I'm not an expert in this area. My understanding is that
gettimeofday() is implemented as single virtual memory access on
modern operating systems, e.g. VDSO on Linux, thus it's very cheap.
I'm not that sure about other supported platforms though. Probably
worth investigating.

> It does mean that you're deTOASTing
> the full GSON field, and that the stored bytestring will not be
> structured / doesn't work well with current debuggers.

Unfortunately, I'm not very well aware of debugging tools in this
context. Could you please name the debuggers I should take into
account?

> We (2ndQuadrant, now part of EDB) made some enhancements to Zson a few years ago, and I have permission to contribute those if this proposal is adopted.

Andrew,

That's great, and personally I very much like the enhancements you've
made. Purely out of curiosity, did they ended up as a part of
2ndQiadrant / EDB products? I will be happy to accept a pull request
with these enhancements regardless of how the story with this proposal
ends up.

> Quite so. To some extent it's a toy. But at least one of our customers
> has found it useful, and judging by Aleksander's email they aren't
> alone.

Indeed, this is an extremely simple extension, ~500 effective lines of
code in C. It addresses a somewhat specific scenario, which, to my
regret, doesn't seem to be uncommon. A pain-killer of a sort. In an
ideal world, people suppose simply to normalize their data.

--
Best regards,
Aleksander Alekseev

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-05-26 10:54:54 Re: Fix typo: multiple tuple => tuples
Previous Message Amit Kapila 2021-05-26 09:30:03 Re: locking [user] catalog tables vs 2pc vs logical rep