Re: ZStandard (with dictionaries) compression support for TOAST compression

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Nikhil Kumar Veldanda <veldanda(dot)nikhilkumar17(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ZStandard (with dictionaries) compression support for TOAST compression
Date: 2025-06-11 02:42:02
Message-ID: aEjs-mTitgkghgmY@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 05, 2025 at 12:03:49AM -0700, Nikhil Kumar Veldanda wrote:
> Agreed. I introduced pg_compression_available(text) and refactored the
> SQL tests accordingly. I split out LZ4 tests into compression_lz4.sql
> and created compression_zstd.sql with the appropriate differences.
>
> v25-0001-Add-pg_compression_available-and-split-sql-compr.patch -
> Introduced pg_compression_available function and split sql tests
> related to compression

I like that as an independent piece because it's going to help a lot
in having new compression methods, so I'm looking forward to getting
that merged into the tree for v19. It can be split into two
independent pieces:
- One patch for the addition of the new function
pg_compression_available(), to detect which compression are supported
at binary level to skip the tests.
- One patch to split the LZ4-only tests into its own file.

The split of the tests is not completely clean as presented in your
patch, though. Your patch only does a copy-paste of the original
file. Some of the basic tests of compression.sql check the
interactions between the use of two compression methods, and the
"basic" compression.sql could just cut them and rely on the LZ4
scripts to do the job, because we want two active different
compression methods for these scenarios. For example, cmdata1
switched to use pglz has little uses. The trick is to have a minimal
set of tests to minimize the run time, while we don't lose in
coverage. Coverage report numbers are useful to compile when it comes
to such exercises, even if it can be an ant's work sometimes.

+ * pg_compression_available(text) → bool

Non-ASCII characters added in the code comments.

+#include "fmgr.h"
+#include "parser/scansup.h"
+#include "utils/builtins.h"

Include file order.

> v25-0002-Design-to-extend-the-varattrib_4b-varatt_externa.patch -
> Design proposal for varattrib_4b & varatt_external
> v25-0003-Implement-Zstd-compression-no-dictionary-support.patch - zstd
> no dictionary compression implementation

About this part, I am not sure yet. TBH, I've been working on this
the code for a different proposal in this area, because I've been
reminded during pgconf.dev that we still depend on 4-byte OIDs for
toast values, and we have done nothing about that for a long time.

If I'm able to pull this off correctly, modernizing the code on the
way, it should make additions related to the handling of different
on-disk varatt_external easier; the compression handling is a part of
that. So yes, that's related to varatt_external, and how we handle
it in the core code in the toasting and detoasting layers. The
difficult part is finding out how a good layer should look like,
because there's a bunch of hardcoded knowledge related to on-disk
TOAST Datums and entries, like the maximum chunk size (control file)
that depends on the toast_pointer, pointer alignment when inserting
the TOAST datums, etc. A lot of these things are close to 20 years
old, we have to maintain on-disk compatibility while attempting to
extend the varatt_external compatibility and there have been many
proposals that did not make it. None of them were really mature
enough in terms of layer deinision. Probably what I'm doing is going
to be flat-out rejected, but we'll see.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo Nagata 2025-06-11 02:49:37 Re: Extend ALTER DEFAULT PRIVILEGES for large objects
Previous Message Steven Niu 2025-06-11 02:27:43 Re: [PATCH] Refactor: Extract XLogRecord info