Re: [PATCH] Compression dictionaries for JSONB

From: Zhihong Yu <zyu(at)yugabyte(dot)com>
To: Aleksander Alekseev <aleksander(at)timescale(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Subject: Re: [PATCH] Compression dictionaries for JSONB
Date: 2022-04-22 15:21:03
Message-ID: CALNJ-vShapaj2ZQxQ9+Gsua-MmgnBJh-Ruc0OzOWS83pySayNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 22, 2022 at 1:30 AM Aleksander Alekseev <
aleksander(at)timescale(dot)com> wrote:

> Hi hackers,
>
> This is a follow-up thread to `RFC: compression dictionaries for JSONB`
> [1]. I would like to share my current progress in order to get early
> feedback. The patch is currently in a draft state but implements the basic
> functionality. I did my best to account for all the great feedback I
> previously got from Alvaro and Matthias.
>
> Usage example:
>
> ```
> CREATE TYPE mydict AS DICTIONARY OF jsonb ('aaa', 'bbb');
>
> SELECT '{"aaa":"bbb"}' :: mydict;
> mydict
> ----------------
> {"aaa": "bbb"}
>
> SELECT ('{"aaa":"bbb"}' :: mydict) -> 'aaa';
> ?column?
> ----------
> "bbb"
> ```
>
> Here `mydict` works as a transparent replacement for `jsonb`. However, its
> internal representation differs. The provided dictionary entries ('aaa',
> 'bbb') are stored in the new catalog table:
>
> ```
> SELECT * FROM pg_dict;
> oid | dicttypid | dictentry
> -------+-----------+-----------
> 39476 | 39475 | aaa
> 39477 | 39475 | bbb
> (2 rows)
> ```
>
> When `mydict` sees 'aaa' in the document, it replaces it with the
> corresponding code, in this case - 39476. For more details regarding the
> compression algorithm and choosen compromises please see the comments in
> the patch.
>
> In pg_type `mydict` has typtype = TYPTYPE_DICT. It works the same way as
> TYPTYPE_BASE with only difference: corresponding `<type>_in`
> (pg_type.typinput) and `<another-type>_<type>` (pg_cast.castfunc)
> procedures receive the dictionary Oid as a `typmod` argument. This way the
> procedures can distinguish `mydict1` from `mydict2` and use the proper
> compression dictionary.
>
> The approach with alternative `typmod` role is arguably a bit hacky, but
> it was the less invasive way to implement the feature I've found. I'm open
> to alternative suggestions.
>
> Current limitations (todo):
> - ALTER TYPE is not implemented
> - Tests and documentation are missing
> - Autocomplete is missing
>
> Future work (out of scope of this patch):
> - Support types other than JSONB: TEXT, XML, etc
> - Automatically updated dictionaries, e.g. during VACUUM
> - Alternative compression algorithms. Note that this will not require any
> further changes in the catalog, only the values we write to pg_type and
> pg_cast will differ.
>
> Open questions:
> - Dictionary entries are currently stored as NameData, the same type that
> is used for enums. Are we OK with the accompanying limitations? Any
> alternative suggestions?
> - All in all, am I moving the right direction?
>
> Your feedback is very much welcomed!
>
> [1]:
> https://postgr.es/m/CAJ7c6TPx7N-bVw0dZ1ASCDQKZJHhBYkT6w4HV1LzfS%2BUUTUfmA%40mail.gmail.com
>
> --
> Best regards,
> Aleksander Alekseev
>
Hi,
For src/backend/catalog/pg_dict.c, please add license header.

+ elog(ERROR, "skipbytes > decoded_size - outoffset");

Include the values for skipbytes, decoded_size and outoffset.

Cheers

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2022-04-22 15:55:10 Re: Re: fix cost subqueryscan wrong parallel cost
Previous Message Daniel Gustafsson 2022-04-22 14:56:38 Cryptohash OpenSSL error queue in FIPS enabled builds