Re: [PATCH] Compression dictionaries for JSONB

From: Aleksander Alekseev <aleksander(at)timescale(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Nikita Malakhov <hukutoc(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: [PATCH] Compression dictionaries for JSONB
Date: 2022-07-12 12:15:17
Message-ID: CAJ7c6TOZSndusE695Cs_x64Fz2JD4dB2yh1mmBRkB4MmEBX8pw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Nikita,

> Aleksander, please point me in the right direction if it was mentioned before, I have a few questions:

Thanks for your feedback. These are good questions indeed.

> 1) It is not clear for me, how do you see the life cycle of such a dictionary? If it is meant to keep growing without
> cleaning up/rebuilding it could affect performance in an undesirable way, along with keeping unused data without
> any means to get rid of them.
> 2) From (1) follows another question - I haven't seen any means for getting rid of unused keys (or any other means
> for dictionary cleanup). How could it be done?

Good point. This was not a problem for ZSON since the dictionary size
was limited to 2**16 entries, the dictionary was immutable, and the
dictionaries had versions. For compression dictionaries we removed the
2**16 entries limit and also decided to get rid of versions. The idea
was that you can simply continue adding new entries, but no one
thought about the fact that this will consume the memory required to
decompress the document indefinitely.

Maybe we should return to the idea of limited dictionary size and
versions. Objections?

> 4) If one dictionary is used by several tables - I see future issues in concurrent dictionary updates. This will for sure
> affect performance and can cause unpredictable behavior for queries.

You are right. Another reason to return to the idea of dictionary versions.

> Also, I agree with Simon Riggs, using OIDs from the general pool for dictionary entries is a bad idea.

Yep, we agreed to stop using OIDs for this, however this was not
changed in the patch at this point. Please don't hesitate joining the
effort if you want to. I wouldn't mind taking a short break from this
patch.

> 3) Is the possible scenario legal - by some means a dictionary does not contain some keys for entries? What happens then?

No, we should either forbid removing dictionary entries or check that
all the existing documents are not using the entries being removed.

> If you have any questions on Pluggable TOAST don't hesitate to ask me and on JSONB Toaster you can ask Nikita Glukhov.

Will do! Thanks for working on this and I'm looking forward to the
next version of the patch for the next round of review.

--
Best regards,
Aleksander Alekseev

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-07-12 12:24:59 Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Previous Message Robert Haas 2022-07-12 12:01:40 Re: Cleaning up historical portability baggage