From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Aleksander Alekseev <aleksander(at)timescale(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Nikita Malakhov <hukutoc(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Jacob Champion <jchampion(at)timescale(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com> |
Subject: | Re: [PATCH] Compression dictionaries for JSONB |
Date: | 2023-02-05 14:50:50 |
Message-ID: | 20230205145050.c2d7jdzhf3w2cslm@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2023-02-05 13:41:17 +0300, Aleksander Alekseev wrote:
> > I don't think the approaches in either of these threads is
> > promising. They add a lot of complexity, require implementation effort
> > for each type, manual work by the administrator for column, etc.
>
> I would like to point out that compression dictionaries don't require
> per-type work.
>
> Current implementation is artificially limited to JSONB because it's a
> PoC. I was hoping to get more feedback from the community before
> proceeding further. Internally it uses type-agnostic compression and
> doesn't care whether it compresses JSON(B), XML, TEXT, BYTEA or
> arrays. This choice was explicitly done in order to support types
> other than JSONB.
I don't think we'd want much of the infrastructure introduced in the
patch for type agnostic cross-row compression. A dedicated "dictionary"
type as a wrapper around other types IMO is the wrong direction. This
should be a relation-level optimization option, possibly automatic, not
something visible to every user of the table.
I assume that manually specifying dictionary entries is a consequence of
the prototype state? I don't think this is something humans are very
good at, just analyzing the data to see what's useful to dictionarize
seems more promising.
I also suspect that we'd have to spend a lot of effort to make
compression/decompression fast if we want to handle dictionaries
ourselves, rather than using the dictionary support in libraries like
lz4/zstd.
> > One of the major justifications for work in this area is the cross-row
> > redundancy for types like jsonb. I think there's ways to improve that
> > across types, instead of requiring per-type work.
>
> To be fair, there are advantages in using type-aware compression. The
> compression algorithm can be more efficient than a general one and in
> theory one can implement lazy decompression, e.g. the one that
> decompresses only the accessed fields of a JSONB document.
> I agree though that particularly for PostgreSQL this is not
> necessarily the right path, especially considering the accompanying
> complexity.
I agree with both those paragraphs.
> above. However having a built-in type-agnostic dictionary compression
> IMO is a too attractive idea to completely ignore it. Especially
> considering the fact that the implementation was proven to be fairly
> simple and there was even no need to rebase the patch since November
> :)
I don't think a prototype-y patch not needing a rebase two months is a
good measure of complexity :)
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2023-02-05 16:02:32 | Re: pg_stat_statements and "IN" conditions |
Previous Message | Andres Freund | 2023-02-05 14:29:33 | Re: File descriptors in exec'd subprocesses |