Re: ZStandard (with dictionaries) compression support for TOAST compression

From: Nikhil Kumar Veldanda <veldanda(dot)nikhilkumar17(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ZStandard (with dictionaries) compression support for TOAST compression
Date: 2025-05-07 23:39:17
Message-ID: CAFAfj_GzTtJX_KXOmYY_a+5X=FOAdAo8y0g5PhfimpO91U84oA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Michael, Thanks for the feedback.

On Wed, May 7, 2025 at 12:49 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> I have been reading 0001 and I'm finding that the integration does not
> seem to fit much with the existing varatt_external, making the whole
> result slightly confusing. A simple thing: the last bit that we can
> use is in varatt_external's va_extinfo, where the patch is using
> VARATT_4BCE_MASK to track that we need to go beyond varatt_external to
> know what kind of compression information we should use. This is an
> important point, and it is not documented around varatt_external which
> still assumes that the last bit could be used for a compression
> method. With what you are doing in 0001 (or even 0002), this becomes
> wrong.

This is the current logic used in patch for varatt_external.

When a datum is compressed with an extended algorithm and must live in
external storage, we set the top two bits of
va_extinfo(varatt_external) to 0b11.

To figure out the compression method for an external TOAST datum:

1. Inspect the top two bits of va_extinfo.
2. If they equal 0b11(VARATT_4BCE_MASK), call
toast_get_compression_id, which invokes detoast_external_attr to fetch
the datum in its 4-byte varattrib form (no decompression) and then
reads its compression header to find the compression method.
3. Otherwise, fall back to the existing
VARATT_EXTERNAL_GET_COMPRESS_METHOD path to get the compression
method.

We use this macro VARATT_EXTERNAL_COMPRESS_METHOD_EXTENDED to
determine if the compression method is extended or not.

Across the entire codebase, external TOAST‐pointer compression methods
are only inspected in the following functions:
1. pg_column_compression
2. check_tuple_attribute (verify_heapam pg function)
3. detoast_attr_slice (just to check pglz or not)

Could you please help me understand what’s incorrect about this approach?

> Shouldn't we have a new struct portion in varattrib_4b's union for
> this purpose at least (I don't recall that we rely on varattrib_4b's
> size which would get larger with this extra byte for the new extended
> data with the three bits set for the compression are set in
> va_extinfo, correct me if I'm wrong here).
> --

In patch v21, va_compressed.va_data points to varatt_cmp_extended, so
adding it isn’t strictly necessary. If we do want to fold it into the
varattrib_4b union, we could define it like this:

```
typedef union
{
struct /* Normal varlena (4-byte length) */
{
uint32 va_header;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_4byte;
struct /* Compressed-in-line format */
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size (excludes header) and
* compression method; see va_extinfo */
char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
} va_compressed;
struct
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size (excludes header) and
* compression method; see va_extinfo */
uint8 cmp_alg;
char cmp_data[FLEXIBLE_ARRAY_MEMBER];
} varatt_cmp_extended;
} varattrib_4b;
```
we don't depend on varattrib_4b size anywhere.

--
Nikhil Veldanda

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo Nagata 2025-05-07 23:53:35 Re: Improve tab completion for COPY
Previous Message David Rowley 2025-05-07 23:14:48 Improve docs for n_distinct_inherited