Re: [PATCH] Compression dictionaries for JSONB

From: Aleksander Alekseev <aleksander(at)timescale(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Nikita Malakhov <hukutoc(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [PATCH] Compression dictionaries for JSONB
Date: 2023-10-12 10:28:48
Message-ID: CAJ7c6TPSN06C+5cYSkyLkQbwN1C+pUNGmx+VoGCA-SPLCszC8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

I would like to continue discussing compression dictionaries.

> So I summarized the requirements we agreed on so far and ended up with
> the following list: [...]

Again, here is the summary of our current agreements, at least how I
understand them. Please feel free to correct me where I'm wrong.

We are going to focus on supporting the:

````
SET COMPRESSION lz4 [WITH|WITHOUT] DICTIONARY
```

... syntax for now. From the UI perspective the rest of the agreements
didn't change compared to the previous summary.

In the [1] discussion (cc: Robert) we agreed to use va_tag != 18 for
the on-disk TOAST pointer representation to make TOAST pointers
extendable. If va_tag has a different value (currently it's always
18), the TOAST pointer is followed by an utf8-like varint bitmask.
This bitmask determines the rest of the content of the TOAST pointer
and its overall size. This will allow to extend TOAST pointers to
include dictionary_id and also to extend them in the future, e.g. to
support ZSTD and other compression algorithms, use 64-bit TOAST
pointers, etc.

Several things occured to me:

- Does anyone believe that va_tag should be part of the utf8-like
bitmask in order to save a byte or two?

- The described approach means that compression dictionaries are not
going to be used when data is compressed in-place (i.e. within a
tuple), since no TOAST pointer is involved in this case. Also we will
be unable to add additional compression algorithms here. Does anyone
have problems with this? Should we use the reserved compression
algorithm id instead as a marker of an extended TOAST?

- It would be nice to decompose the feature in several independent
patches, e.g. modify TOAST first, then add compression dictionaries
without automatic update of the dictionaries, then add the automatic
update. I find it difficult to imagine however how to modify TOAST
pointers and test the code properly without a dependency on a larger
feature. Could anyone think of a trivial test case for extendable
TOAST? Maybe something we could add to src/test/modules similarly to
how we test SLRU, background workers, etc.

[1]: https://www.postgresql.org/message-id/flat/CAN-LCVMq2X%3Dfhx7KLxfeDyb3P%2BBXuCkHC0g%3D9GF%2BJD4izfVa0Q%40mail.gmail.com

--
Best regards,
Aleksander Alekseev

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Erki Eessaar 2023-10-12 10:38:25 PostgreSQL domains and NOT NULL constraint
Previous Message Heikki Linnakangas 2023-10-12 10:24:27 Re: Special-case executor expression steps for common combinations