Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Dharin Shah <dharinshah95(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format
Date: 2025-12-18 22:44:03
Message-ID: aUSDs7SaEcOa65gD@paquier.xyz
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgadmin-hackers pgsql-hackers

On Thu, Dec 18, 2025 at 10:44:22PM +0100, Dharin Shah wrote:
> I want to make sure I understand your main point: you're OK with a new
> `vartag_external`, but prefer we avoid increasing the heap TOAST pointer
> from 16 -> 20 bytes since every zstd-toasted value would pay +4 bytes in
> the main heap tuple.

That would be my choice, yes. Not sure about the opinion of others on
this matter.

> I also realize the "compatibility" of the extended header doesn't buy us
> much — we'll need to support the existing 16-byte varatt_external forever
> for backward compatibility. Adding a 20-byte structure just means two
> formats to maintain indefinitely.

Yes. Patches have to maintain on-disk compatibility.

> A couple clarifying questions if we go with new vartag (e.g.,
> `VARTAG_ONDISK_ZSTD`), same 16-byte `varatt_external` payload, vartag as
> discriminator
> 1. How should we handle future methods beyond zstd? One tag per method, or
> store a method id elsewhere (e.g., in TOAST chunk header)?

My suspicion would be that we could either use a new set of vartags in
the future for each compression method. When it comes to zstd there
is something that comes in play: we could set some bits related to
dictionnaries at tuple level. Not sure if this is the best design or
if using an attribute-level option is more adapted (for example a
JSONB blob could be applied as an attribute with common keys in a
dictionnary saving a lot of on-disk space even before compression),
but keeping some bits free in the 16-byte header leaves this option
open with a new vartag_external. Saying that, zstd is good enough
that I strongly suspect that we would not regret it for quite a few
years. One issue that has pushed towards the addition of lz4 as an
option for toast compression is that pglz was worse in terms of CPU
cost. zlib is also more expensive than lz4 or zstd, especially at
very high compression level for usually little compression gains.

> 2. And re: "as long as the TOAST value is 32 bits" — are you referring to
> the 30-bit extsize field in va_extinfo (i.e., avoid stealing bits from
> extsize for method encoding)?

I mean extending the TOAST value to 8 bytes, as per the following
issues:
https://www.postgresql.org/message-id/764273.1669674269%40sss.pgh.pa.us
https://commitfest.postgresql.org/patch/5830/

> *Key findings (i guess well known at this point):*
> - ZSTD excels for repetitive/pattern-heavy data (6.7x better than PGLZ)
> - For low-redundancy data (MD5 hashes), ZSTD still achieves ~2x better
> - The T4 result showing zstd as "worse" is not about compression quality -
> it's about missing inline storage support. ZSTD actually compresses better,
> but pays unnecessary TOAST overhead.
>
> I'll share the detailed benchmark script with the next patch revision. But
> also a potential path forward could be that we could just fully replace
> pglz (can bring it up later in different thread)

I don't think that we will ever be able to remove pglz. It would be
nice, as final result of course, but I also expect that not being able
to decompress pglz data is going to lead to a lot of user pain. That
would be also very expensive to check at upgrade for large instances.

> *On Testing and Patch Structure*
> Agreed on both points:
> - I'll use `compression_zstd.sql` following the `compression_lz4.sql`
> pattern (removing the test_toast_ext module)

Okay.

> - I'll split the GUC refactoring into a separate preparatory patch

This refactoring, if done nicely, is worth an independent piece. It's
something that I have actually done for the sake of the other thread,
though the result was not really much liked by others. Perhaps I'm
just lacking imagination with this abstraction, and I'd surely welcome
different ideas.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-12-18 23:01:54 Re: Fix memory leak in gist_page_items() of pageinspect
Previous Message Paul A Jungwirth 2025-12-18 22:41:16 Re: SQL:2011 Application Time Update & Delete

Browse pgadmin-hackers by date

  From Date Subject
Previous Message Dharin Shah 2025-12-18 21:44:22 Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format