Re: [PATCH] Compression dictionaries for JSONB

From: Aleksander Alekseev <aleksander(at)timescale(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Nikita Malakhov <hukutoc(at)gmail(dot)com>, Jacob Champion <jchampion(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Compression dictionaries for JSONB
Date: 2023-02-07 13:39:45
Message-ID: CAJ7c6TN11=H-L1jT8YC8zpmEW0UVuh9D6xp47ajgcWWv+q16GQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

> > The complexity of page-level compression is significant, as pages are
> > currently a base primitive of our persistency and consistency scheme.
>
> +many
>
> It's also not all a panacea performance-wise, datum-level decompression can
> often be deferred much longer than page level decompression. For things like
> json[b], you'd hopefully normally have some "pre-filtering" based on proper
> columns, before you need to dig into the json datum.

This is actually a good point.

> It's also not necessarily that good, compression ratio wise. Particularly for
> wider datums you're not going to be able to remove much duplication, because
> there's only a handful of tuples. Consider the case of json keys - the
> dictionary will often do better than page level compression, because it'll
> have the common keys in the dictionary, which means the "full" keys never will
> have to appear on a page, whereas page-level compression will have the keys on
> it, at least once.

To clarify, what I meant was applying an idea of compression with
shared dictionaries to the pages instead of tuples. Just to make sure
we are on the same page.

> Page-level compression can not compress patterns that have a length of
> more than 1 page. TOAST is often used to store values larger than 8kB,
> which we'd prefer to compress to the greatest extent possible. So, a
> value-level compression method specialized to the type of the value
> does make a lot of sense, too.

Let's not forget that TOAST table is a table too. Page-level
compression applies to it as well as to a regular one.

> Of course you can use a dictionary for page-level compression too, but the
> gains when it works well will often be limited, because in most OLTP usable
> page-compression schemes I'm aware of, you can't compress a page all that far
> down, because you need a small number of possible "compressed page sizes".

That's true. However compressing an 8 KB page to, let's say, 1 KB, is
not a bad result as well.

In any case, there seems to be advantages and disadvantages of either
approach. Personally I don't care that much which one to choose. In
fact, although my own patch proposed attribute-level compression, not
tuple-level one, it is arguably closer to tuple-level approach than
page-level one. So to a certain extent I would be contradicting myself
by trying to prove that page-level compression is the way to go. Also
Matthias has a reasonable concern that page-level compression may have
implications for the WAL size. (Maybe it will not but I'm not ready to
prove it right now, nor am I convinced this is necessarily true.)

So, let's focus on tuple-level compression then.

> > > More similar data you compress the more space and disk I/O you save.
> > > Additionally you don't have to compress/decompress the data every time
> > > you access it. Everything that's in shared buffers is uncompressed.
> > > Not to mention the fact that you don't care what's in pg_attribute,
> > > the fact that schema may change, etc. There is a table and a
> > > dictionary for this table that you refresh from time to time. Very
> > > simple.
> >
> > You cannot "just" refresh a dictionary used once to compress an
> > object, because you need it to decompress the object too.
>
> Right. That's what I was trying to refer to when mentioning that we might need
> to add a bit of additional information to the varlena header for datums
> compressed with a dictionary.

> [...]
> and when you have many - updating an existing dictionary requires
> going through all objects compressed with it in the whole database.
> It's a very tricky question how to implement this feature correctly.

Yep, that's one of the challenges.

One approach would be to extend the existing dictionary. Not sure if
ZSTD / LZ4 support this, they probably don't. In any case this is a
sub-optimal approach because the dictionary will grow indefinitely.

We could create a dictionary once per table and forbid modifying it.
Users will have to re-create and refill a table manually if he/she
wants to update the dictionary by using `INSERT INTO .. SELECT ..`.
Although this is a possible solution I don't think this is what Andres
meant above by being invisible to the user. Also it would mean that
the new dictionary should be learned on the old table before creating
the new one with a new dictionary which is awkward.

This is why we need something like dictionary versions. A dictionary
can't be erased as long as there is data that uses this version of a
dictionary. The old data should be decompressed and compressed again
with the most recent dictionary, e.g. during VACUUM or perhaps VACUUM
FULL. This is an idea I ended up using in ZSON.

There may be alternative solutions, but I don't think I'm aware of
such. (There are JSON Schema, Protobuf etc, but they don't work for
general-purpose compression algorithms and/or arbitrary data types.)

> Let's keep improving Postgres for everyone.

Amen.

--
Best regards,
Aleksander Alekseev

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Takamichi Osumi (Fujitsu) 2023-02-07 13:41:52 RE: Time delayed LR (WAS Re: logical replication restrictions)
Previous Message Yugo NAGATA 2023-02-07 13:34:41 Re: make_ctags: use -I option to ignore pg_node_attr macro