Re: ZStandard (with dictionaries) compression support for TOAST compression

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Nikhil Kumar Veldanda <veldanda(dot)nikhilkumar17(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ZStandard (with dictionaries) compression support for TOAST compression
Date: 2025-05-30 07:20:51
Message-ID: aDlcU-ym9KfMj9sG@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 27, 2025 at 02:59:17AM -0700, Nikhil Kumar Veldanda wrote:
> typedef struct varatt_external
> {
> int32 va_rawsize; /* Original data size (includes header) */
> uint32 va_extinfo; /* External size (without header) and
> * compression method */
> Oid va_valueid; /* Unique ID within TOAST table */
> Oid va_toastrelid; /* OID of TOAST table containing it */
> /* -------- optional trailer -------- */
> union
> {
> struct /* compression-method trailer */
> {
> uint8 va_ecinfo; /* Extended-compression-method info */
> } cmp;
> } extended; /* “extended” = optional byte */
> } varatt_external;
> ```

Yeah, something like that does make sense to me. If the three bits of
va_extinfo are set, we'd look at the next one. I'll try to think a
bit harder about the structure of varatt.h; that's the important bit
and some of its stuff is outdated. That's not necessarily related to
the patch discussed here.

> I'm proposing not to store any metadata exclusively at varatt_external
> level because storing metadata within varatt_external is not always
> appropriate because in scenarios where datum initially qualifies for
> out-of-line storage but becomes sufficiently small in size after
> compression—specifically under the 2KB threshold(extended storage
> type)—it no longer meets the criteria for external storage.

By metadata, are you referring to the dictionary data, like an ID
pointing to a dictionary stored elsewhere or even the dictionary data
itself? It could be something else, of course. I think that it makes
sense, because we don't need the dictionary to know which code path to
take; the compression method is the only important information to be
able to redirect to the slice, compression or decompression routines.

> Given this behavior, it is more robust to store metadata at the
> varattrib_4b level. This ensures that metadata remains accessible
> regardless of whether the datum ends up stored in-line or externally.
> Moreover, during detoasting it first fetches the external data,
> reconstructs it into varattrib_4b, then decompresses—so keeping
> metadata in varattrib_4b matches that flow.

Okay.

> This is the layout for extra 1 byte in both varatt_external and varattrib_4b.
> ```
> bit 7 6 5 4 3 2 1 0
> +---+---+---+---+---+---+---+---+
> | cmid − 2 | F|
> +---+---+---+---+---+---+---+---+
>
> • Bits 7–1 (cmid − 2)
> – 7-bit field holding compression IDs: raw ∈ [0…127] ⇒ cmid = raw +
> 2 ([2…129])
> • Bit 0 (F)
> – flag indicating whether the algorithm expects metadata
> ```

Yeah, dedicating one bit to this fact should be more than enough, and
the metadata associated to each compression method may differ. I
don't have a lot of imagination on the matter with any other
compression methods floating around in the industry, unfortunately, so
my imagination is limited.

> Introduced metadata flag in the 1-byte layout, To prevent zstd from
> exposing dict or nodict types for ToastCompressionId. This metadata
> flag indicates whether the algorithm expects any metadata or not. For
> the ZSTD scenario, if the flag is set, it expects a dictid; otherwise,
> no dictid is present.
> ```
> typedef enum ToastCompressionId
> {
> TOAST_PGLZ_COMPRESSION_ID = 0,
> TOAST_LZ4_COMPRESSION_ID = 1,
> TOAST_ZSTD_COMPRESSION_ID = 2,
> TOAST_INVALID_COMPRESSION_ID = 3,
> } ToastCompressionId;

This makes sense to me; we should try to untangle ToastCompressionId
the dependency between ToastCompressionId and what's stored on disk
for the purpose of extensibility.

> struct /* Extended compressed in-line format */
> {
> uint32 va_header;
> uint32 va_tcinfo; /* Original data size and method; see va_extinfo */
> uint8 va_ecinfo; /* Algorithm ID (0–255) */
> char va_data[FLEXIBLE_ARRAY_MEMBER];
> } va_compressed_ext;
> } varattrib_4b;
> ```

Yep.

> During compression, compression methods (zstd_compress_datum) will
> determine whether to use metadata(dictionary) or not based on
> CompressionInfo.meta.

Not sure about this one.

> ALTER TABLE tblname
> ALTER COLUMN colname
> SET (zstd_level = 5);
> ```
>
> Since PostgreSQL currently doesn’t expose LZ4 compression levels, I
> propose adding per-column ZSTD compression level settings so users can
> tune the speed/ratio trade-off. I’d like to hear thoughts on this
> approach.

Specifying that as an attribute option makes sense here, but I don't
think that this has to be linked to the initial patch set that should
extend the toast data for the new compression method. It's a bit hard
to say how relevant that is, and IMV it's kind of hard for users to
know which level makes more sense. Setting up the wrong level can be
equally very costly in CPU. For now, my suggestion would be to focus
on the basics, and discard this part until we figure out the rest.

Anyway, I've read through the patches, and got a couple of comments.
This includes a few pieces that we are going to need to make the
implementation a bit easier that I've noticed while reading your
patch. Some of them can be implemented even before we add this extra
byte for the new compression methods in the varlena headers.

+CompressionInfo
+setup_cmp_info(char cmethod, Form_pg_attribute att)

This routine declares a Form_pg_attribute as argument, does not use
it. Due to that, it looks that attoptcache.h is pulled into
toast_compression.c.

Patch 0001 has the concept of metadata with various facilities, like
VARATT_4BCE_HAS_META(), CompressionInfo, etc. However at the current
stage we don't need that at all. Wouldn't it be better to delay this
kind of abstraction layer to happen after we discuss how (and if) the
dictionary part should be introduced rather than pay the cost of the
facility in the first step of the implementation? This is not
required as a first step. The toast zstd routines introduced in patch
0002 use !meta, discard meta=true as an error case.

+/* Helper: pack <flag, cmid> into a single byte: flag (b0), cmid-2
(b1..7) */

Having a one-liner here is far from enough? This is the kind of thing
where we should spend time describing how things are done and why they
are done this way. This is not sufficient, there's just too much to
guess. The fact that we have VARATT_4BCE_EXTFLAG is only, but there's
no real information about va_ecinfo and that it relates to the three
bits sets, for example.

+#define VARTAG_SIZE(PTR) \
[...]
UNALIGNED_U32()

This stuff feels magic. It's hard for someone to understand what's
going on here, and there is no explanation about why it's done this
way.

-toast_compress_datum(Datum value, char cmethod)
+toast_compress_datum(Datum value, CompressionInfo cmp)
[...]
- /* If the compression method is not valid, use the current default */
- if (!CompressionMethodIsValid(cmethod))
- cmethod = default_toast_compression;

Removing the fallback to the default toast compression GUC if nothing
is valid does not look right. There could be extensions that depend
on that, and it's unclear what the benefits of setup_cmp_info() are,
because it is not documented, so it's hard for one to understand how
to use these changes.

- result = (struct varlena *) palloc(TOAST_POINTER_SIZE);
+ result = (struct varlena *) palloc(TOAST_CMPID_EXTENDED(cm) ? TOAST_POINTER_EXT_SIZE : TOAST_POINTER_NOEXT_SIZE);
[...]
- memcpy(VARDATA_EXTERNAL(result), &toast_pointer,
sizeof(toast_pointer));
+ memcpy(VARDATA_EXTERNAL(result), &toast_pointer,
TOAST_CMPID_EXTENDED(cm) ? TOAST_POINTER_EXT_SIZE - VARHDRSZ_EXTERNAL
: TOAST_POINTER_NOEXT_SIZE - VARHDRSZ_EXTERNAL) ;

That looks, err... Hard to maintain to me. Okay, that's a
calculation for the extended compression part, but perhaps this is a
sign that we need to think harder about the surroundings of the
toast_pointer to ease such calculations.

+ {
+ {
+ "zstd_level",
+ "Set column's ZSTD compression level",
+ RELOPT_KIND_ATTRIBUTE,
+ ShareUpdateExclusiveLock
+ },
+ DEFAULT_ZSTD_LEVEL, MIN_ZSTD_LEVEL, MAX_ZSTD_LEVEL
+ },

This could be worth a patch on its own, once we get the basics sorted
out. I'm not even sure that we absolutely need that, TBH. The last
time I've had a discussion on the matter for WAL compression we
discarded the argument about the level because it's hard to understand
how to tune, and the default is enough to work magics. For WAL, we've
been using ZSTD_CLEVEL_DEFAULT in xloginsert.c, and I've not actually
heard much about people wanting to tune the compression level. That
was a few years ago, perhaps there are some more different opinions on
the matter.

+#define COMPRESSION_METHOD_NOT_SUPPORTED(method) \
ereport(ERROR, \
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), \
- errmsg("compression method lz4 not supported"), \
- errdetail("This functionality requires the server to be built with lz4 support.")))
+ errmsg("compression method %s not supported", method), \
+ errdetail("This functionality requires the server to be built with %s support.", method)))

Let's make that a first independent patch that applies on top of the
rest. My original zstd toast patch did the same, without a split.

Your patch introduces a new compression_zstd, touching very lightly
compression.sql. I think that we should and can do much better than
that in the long term. The coverage of compression.sql is quite good,
and what the zstd code is adding does not cover all of it. Let's
rework the tests of HEAD and split compression.sql for the LZ4 and
pglz parts. If one takes a diff between compression.out and
compression_1.out, he/she would notice that the only differences are
caused by the existence of the lz4 table. This is not the smartest
move we can do if we add more compression methods, so I'd suggest the
following:
- Add a new SQL function called pg_toast_compression_available(text)
or similar, able to return if a toast compression method is supported
or not. This would need two arguments once the initial support for
zstd is done: lz4 and zstd. For head, we only require one: lz4.
- Now, the actual reason why a function returning a boolean result is
useful is for the SQL tests. It is possible with \if to make the
tests conditional if LZ4 is supported or now, limiting the noise if
LZ4 is not supported. See for example the tricks we use for the UTF-8
encoding or NUMA.
- Move the tests related to lz4 into a separate file, outside
compression.sql, in a new file called compression_lz4.sql. With the
addition of zstd toast support, we would add a new file:
compression_zstd.sql. The new zstd suite would then just need to
copy-paste the original one, with few tweaks. It may be better to
parameterize that but we don't do that anymore these days with
input/output regression files.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Reda Agaoua 2025-05-30 07:21:49 Re: Suggestion : support for environment variable in initdb to set the superuser password
Previous Message Noboru Saito 2025-05-30 05:27:04 Re: [PATCH] Proposal: Improvements to PDF stylesheet and table column widths