Re: ZStandard (with dictionaries) compression support for TOAST compression

From: Nikhil Kumar Veldanda <veldanda(dot)nikhilkumar17(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ZStandard (with dictionaries) compression support for TOAST compression
Date: 2025-05-04 12:54:34
Message-ID: CAFAfj_HPvuzXqbkgQ6F=ocR=JHe9PA7iVazGaCcrtPm-hKZ6pQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Robert

> But I don't quite understand the point of this
> response: it seems like you're just restating what the design does
> without really justifying it. The question here isn't whether a 3-byte
> header can describe a length up to 16MB; I think we all know our
> powers of two well enough to agree on the answer to that question. The
> question is whether it's a good use of 3 bytes, and I don't think it
> is.

My initial decision to include a 3‑byte length field was driven by two goals:
1. Avoid introducing separate callbacks for each algorithm.
2. Provide a single, algorithm-agnostic mechanism for handling
metadata length.

After re-evaluating based on your feedback, I agree that the fixed
overhead of a 3-byte length field outweighs its benefit; per-algorithm
callbacks deliver the same functionality while saving three bytes per
datum.

> I did consider the fact that future compression algorithms might want
> to use variable-length headers; but I couldn't see a reason why we
> shouldn't let each of those compression algorithms decide for
> themselves how to encode whatever information they need. If a
> compression algorithm needs a variable-length header, then it just
> needs to make that header self-describing. Worst case scenario, it can
> make the first byte of that variable-length header a length byte, and
> then go from there; but it's probably possible to be even smarter and
> use less than a full byte. Say for example we store a dictionary ID
> that in concept is a 32-bit quantity but we use a variable-length
> integer representation for it. It's easy to see that we shouldn't ever
> need more than 3 bits for that so a full length byte is overkill and,
> in fact, would undermine the value of a variable-length representation
> rather severely. (I suspect it's a bad idea anyway, but it's a worse
> idea if you burn a full byte on a length header.)
>

I agree. Each compression algorithm can decide its own metadata size
overhead. Callbacks can provide this information as well rather than
storing in fixed length bytes(3 bytes). The revised patch introduces a
"toast_cmpid_meta_size(const varatt_cmp_extended *hdr)", which
calculates the metadata size.

> But there's an even larger question here too, which is why we're
> having some kind of discussion about generalized metadata when the
> current project seemingly only requires a 4-byte dictionary OID. If
> you have some other use of this space in mind, I don't think you've
> told us what it is. If you don't, then I'm not sure why we're
> designing around an up-to-16MB variable-length quantity when what we
> have before us is a 4-byte fixed-length quantity.

This project only requires 4 bytes of fixed-size metadata to store the
dictionary ID.

Updated design for extending varattrib_4b compression

1. extensible header

/*
* varatt_cmp_extended: an optional per‐datum header for extended
compression method.
* Only used when va_tcinfo's top two bits are "11".
*/
typedef struct varatt_cmp_extended
{
uint8 cmp_alg;
char cmp_meta[FLEXIBLE_ARRAY_MEMBER]; /*
algorithm‐specific metadata */
} varatt_cmp_extended;

2. Algorithm registry and metadata size dispatch

static inline uint32
unsupported_meta_size(const varatt_cmp_extended *hdr)
{
elog(ERROR, "toast_cmpid_meta_size called for unsupported
compression algorithm");
return 0; /* unreachable */
}

/* no metadata for plain-ZSTD */
static inline uint32
zstd_nodict_meta_size(const varatt_cmp_extended *hdr)
{
return 0;
}

static inline uint32
zstd_dict_meta_size(const varatt_cmp_extended *hdr)
{
return sizeof(Oid);
}

/*
* TOAST compression methods enumeration.
*
* NAME : algorithm identifier
* VALUE : enum value
* META-SIZE-FN : Calculates algorithm metadata size.
*/
#define TOAST_COMPRESSION_LIST \
X(PGLZ, 0, unsupported_meta_size) \
X(LZ4, 1, unsupported_meta_size) \
X(ZSTD_NODICT, 2, zstd_nodict_meta_size) \
X(ZSTD_DICT, 3, zstd_dict_meta_size) \
X(INVALID, 4, unsupported_meta_size) /* sentinel */

/* Compression algorithm identifiers */
typedef enum ToastCompressionId
{
#define X(name,val,fn) TOAST_##name##_COMPRESSION_ID = (val),
TOAST_COMPRESSION_LIST
#undef X
} ToastCompressionId;

/* lookup table to check if compression method uses extended format */
static const bool toast_cmpid_extended[] = {
#define X(name,val,fn) \
/* PGLZ, LZ4 don't use extended format */ \
[TOAST_##name##_COMPRESSION_ID] = \
((val) != TOAST_PGLZ_COMPRESSION_ID && \
(val) != TOAST_LZ4_COMPRESSION_ID && \
(val) != TOAST_INVALID_COMPRESSION_ID),
TOAST_COMPRESSION_LIST
#undef X
};

#define TOAST_CMPID_EXTENDED(alg) (toast_cmpid_extended[alg])

/*
* Prototype for a per-datum metadata-size callback:
* given a pointer to the extended header, return
* how many metadata bytes follow it.
*/
typedef uint32 (*ToastMetaSizeFn) (const varatt_cmp_extended *hdr);

/* Callback table—indexed by ToastCompressionId */
static const ToastMetaSizeFn toast_meta_size_fns[] = {
#define X(name,val,fn) [TOAST_##name##_COMPRESSION_ID] = fn,
TOAST_COMPRESSION_LIST
#undef X
};

/* Calculates algorithm metadata size */
static inline uint32
toast_cmpid_meta_size(const varatt_cmp_extended *hdr)
{
Assert(hdr != NULL);
return toast_meta_size_fns[hdr->cmp_alg] (hdr);
}

Each compression algorithm provides a static callback that returns the
size of its metadata, given a pointer to the varatt_cmp_extended
header. Algorithms with fixed-size metadata return a constant, while
algorithms with variable-length metadata are responsible for defining
and parsing their own internal headers to compute the metadata size.

3. Resulting on-disk layouts for zstd

ZSTD (nodict) — datum on‑disk layout

+----------------------------------+
| va_header (uint32) |
+----------------------------------+
| va_tcinfo (uint32) | ← top two bits = 11 (extended)
+----------------------------------+
| cmp_alg (uint8) | ← (ZSTD_NODICT)
+----------------------------------+
| compressed bytes … | ← ZSTD frame
+----------------------------------+

ZSTD(dict) — datum on‑disk layout

+----------------------------------+
| va_header (uint32) |
+----------------------------------+
| va_tcinfo (uint32) | ← top two bits = 11 (extended)
+----------------------------------+
| cmp_alg (uint8) | ← (ZSTD_DICT)
+----------------------------------+
| dict_id (uint32) | ← dictionary OID
+----------------------------------+
| compressed bytes … | ← ZSTD frame
+----------------------------------+

I hope this updated design addresses your concerns. I would appreciate
any further feedback you may have. Thanks again for your guidance—it's
been very helpful.

v20-0001-varattrib_4b-design-proposal-to-make-it-extended.patch:
varattrib_4b extensibility – adds varatt_cmp_extended, metadata size
dispatch and useful macros; behaviour unchanged.
v20-0002-zstd-nodict-compression.patch: Plain ZSTD (non dict) support.

--
Nikhil Veldanda

Attachment Content-Type Size
v20-0002-zstd-nodict-compression.patch application/x-patch 37.7 KB
v20-0001-varattrib_4b-design-proposal-to-make-it-extended.patch application/x-patch 19.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Wolfgang Walther 2025-05-04 12:58:48 Re: [PoC] Federated Authn/z with OAUTHBEARER
Previous Message Etsuro Fujita 2025-05-04 12:30:42 Re: Options to control remote transactions’ access/deferrable modes in postgres_fdw