Re: Zstandard support for toast compression

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Nikolay Shaplov <dhyan(at)nataraj(dot)su>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Zstandard support for toast compression
Date: 2022-05-23 13:44:44
Message-ID: CA+TgmoYXFyvJzxF9rqA2nC9qZsoWzW+rPmvtj3uF9+1ertoL=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 20, 2022 at 4:17 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> A thought I've had before is that it'd be nice to specify a particular
> compression method on a data type basis. Wasn't the direction that this
> was taken, for reasons, but I wonder about perhaps still having a data
> type compression method and perhaps one of these bits might be "the data
> type's (default?) compression method". Even so though, having an
> extensible way to add new compression methods would be a good thing.

If we look at pglz vs. LZ4, there's no argument that it makes more
sense to use LZ4 for some data types and PGLZ for others. Indeed, it's
unclear why you would ever use PGLZ if you had LZ4 as an option. Even
if we imagine a world in which a full spectrum of modern compressors -
Zstandard, bzip2, gzip, and whatever else you want - it's basically a
time/space tradeoff. You will either want a fast compressor or a good
one.

The situation in which this sort of thing might make sense is if we
had a compressor that is specifically designed to work well on a
certain data type, and especially if the code for that data type could
perform some operations directly on the compressed representation.
From what I understand, the ideas that people have in this area around
jsonb require that there be a dictionary available. For instance, you
might scan a jsonb column, collect all the keys that occur frequently,
put them in a dictionary, and then use them to compress the column. I
can see that being effective, but the infrastructure to store that
dictionary someplace is infrastructure we have not got.

It may be better to try to handle these use cases by building the
compression into the data type representation proper, perhaps
disabling the general-purpose TOAST compression stuff, rather than by
making it part of TOAST compression. We found during the
implementation of LZ4 TOAST compression that it's basically impossible
to keep a compressed datum from "leaking out" into other parts of the
system. We have to assume that any datum we create by TOAST
compression may continue to exist somewhere in the system long after
the table in which it was originally stored is gone. So, while a
dictionary could be used for compression, it would have to be done in
a way where that dictionary wasn't required to decompress, unless
we're prepared to prohibit ever dropping a dictionary, which sounds
like not a lot of fun. If the compression were part of the data type
instead of part of TOAST compression, we would dodge this problem.

I think that might be a better way to go.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message torikoshia 2022-05-23 13:46:22 fix typos in storing statistics in shared memory
Previous Message Robert Haas 2022-05-23 13:32:15 Re: Zstandard support for toast compression